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1  Introduction 

1.1  Executive  Summary 

rT1HIS  report  describes  the  Phase  I  activities  in  Advanced  Research  in 
Recognition  of  Handwritten  Address  ZIP  Codes  conducted  for  the 
United  States  Postal  Service  at  the  Environmental  Research  In¬ 
stitute  of  Michiganunder  contract  104230- 86- H-004^?  These  activities  in¬ 
clude  an  in-depth  review  of  the  optical  character  recognition  literature,  the 
development  of  a  handwritten  address  digitized  image  data  base,  the  devel¬ 
opment  of  a  hardware  and  software  testbed  for  investigating  the  recognition 
of  handwritten  addresses,  and  the  design  of  a  prototype  end-to-end  ZIP  Code 
recognition  system.  Beyond  the  scope  originally  intended  for  Phase  I.  ERIM 
has  implemented  that  end-to-end  system  and  has  determined  that  it  achieves 
ninety  percent  digb  identification  on  limited  test  data.  Featured  within  the 
overall  activities  is  the  concept  that  development  of  image  algorithms  is  an 
incremental  process.  This  concept  is  strongly  reflected  in  the  testbed  archi¬ 
tecture  that  has  resulted  from  this  work.  This  approach  is  unique  in  that 
it  enables  continued  system  refinement  in  a  way  that  is  both  understand¬ 
able  and  meaningful.  A  plan  for  such  refinement  of  the  prototype  system  is 
proposed  for  Phase  II  of  this  project. 

(  ... 

\ 

1.2  Motivation 

Presently,  handwritten  mail  that  is  successfully  read  by  Phase  II  OCR  ma¬ 
chines  represents  less  than  one  half  of  one  percent  of  all  letter  mail  sorted  in 
the  United  States.  Approximately  15%  of  all  U.S.  letter  mail  is  handwrit¬ 
ten,  of  which  only  4%  was  correctly  identified  by  a  Phase  II  OCR  machine 
in  a  recent  test  [1].  It  is  desirable  to  extend  the  ability  of  optical  charac¬ 
ter  readers  to  include  recognition  of  unconstrained  handwritten  characters 
so  that  this  15%  can  be  sorted  automatically.  To  minimize  constraints  on 
postal  service  users,  techniques  which  impose  restrictions  upon  handwritten 
addresses,  such  as  preprinted  guide  boxes,  will  not  be  used.  The  present  mail 
sorting  system  will  therefore  be  enhanced  by  an  automated  system  capable 
of  recognizing  a  large  percentage  of  unconstrained  handwritten  ZIP  Codes. 
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1.3 


Project  Goals 


The  primary  goal  of  this  OCR  project  is  to  develop  new  techniques  for  the 
automatic  location  and  recognition  of  unconstrained  handwritten  ZIP  Codes 
in  address  blocks.  A  secondary  goal  is  to  develop  techniques  for  increasing 
the  ZIP  Code  recognition  reliability  with  city  and  state  information  from 
the  address  block.  To  accomplish  these  goals,  our  research  has  been  broken 
into  two  phases.  In  the  first  phase,  the  major  goal  has  been  to  develop  a 
prototype  end-to-end  system  and  a  methodology  for  character  recognition 
that  facilitates  incremented  improvements.  Tasks  required  to  complete  this 
system  have  been  software  development  and  hardware  acquisition  for  a  hand¬ 
written  address  recognition  testbed  and  development  of  an  image  data  base. 
The  second  phase  will  improve  recognition  rates  achieved  in  the  first,  phase 
through  an  iterative  process  of  testing  and  refining  the  prototype  algorithms 
developed  during  Phase  I.  The  OCR  methodology  developed  in  the  first 
phase  will  provide  the  framework  for  algorithmic  improvements. 

The  work  for  Phase  I  was  broken  into  the  following  tasks  and  completed 
as  summarized  here. 

1.3.1  Literature  Survey 

Become  familiar  with  performance  requirements  for  postal  OCRs  and  with 
previous  handwritten  address  recognition  system  efforts  through  a  literature 
survey  and  USPS  library  research. 

An  extensive  literature  search  has  been  performed,  and  a  survey  article  de¬ 
scribing  and  categorizing  previous  techniques  has  been  written  [2].  This 
survey  also  relates  ERIM's  methods  with  past  techniques.  A  trip  was  made 
to  the  USPS  library  in  Washington,  and  to  the  Detroit  post  office.  We  would 
like  to  thank  Leonard  Tomlinson,  Industrial  Engineering  Coordinator  at  De¬ 
troit's  post  office,  who  took  the  time  to  show  us  the  sorting  process  for  letters 
and  flats.  We  observed  both  hand  letter  sorting  and  automatic  OCR  letter 
sorting  processes.  We  were  able  to  see  first  hand  the  types  of  handwritten 
addresses  that  are  rejected  or  successfully  read  by  current  OCR  machinery. 
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1.3.2  Image  Data  Base 

Develop  a  handwritten  address  image  data  base  for  use  in  OCR  system  de¬ 
velopment. 

Over  S00  handwritten  address  images  have  been  added  to  our  data  base. 
Sources  of  this  data  base  include  the  EKTRON  images  originally  supplied 
to  us,  but  consist  mostly  of  images  digitized  at  the  State  University  of  New 
York  SUNY.  We  expect  to  expand  this  OCR  data  base  to  over  2500  images 
during  Phase  II. 

1.3.3  Prototype  Recognition  System 

Develop  prototype  algorithms  for  image  processing  and  character  classifica¬ 
tion  that  are  used  in  a  prototype  end-to-end  system,  and  develop  a  method¬ 
ology  for  improving  these  algorithms  easily.  The  algorithms  must  automat¬ 
ically  locate  the  five-  or  nine-digit  ZIP  Code  block,  and  then  recognize  the 
characters  within  that  block.  They  must  also  be  extendable  to  use  informa¬ 
tion  outside  the  ZIP  Code  block,  such  as  City /State  information  to  reduce 
the  possible  number  of  ZIP  Code  candidates. 

Prototype  algorithms  to  locate  the  ZIP  Code  block  and  recognize  its  char¬ 
acters  have  been  written  and  used  to  build  a  prototype  OCR  system.  This 
system  has  been  tested  using  the  handwritten  address  data  base.  The  USPS 
Citv/State/ZIP  information  data  base  has  been  read  and  will  be  used  to  rule 
out  nonexistent  ZIP  Codes,  and  may  also  be  used  to  enhance  ZIP  Code  de¬ 
termination.  These  algorithms  and  the  prototype  recognition  system  will  be 
described  later. 

1.4  Project  Outcomes 

Phase  I  activities  have  produced  four  major  outcomes:  the  OCR  literature 
review,  the  handwritten  address  digitized  data  base,  the  hardware  and  soft¬ 
ware  testbed  for  investigating  the  recognition  of  handwritten  addresses,  and 
the  prototype  ZIP  Code  recognition  system.  The  OCR  literature  review  has 
provided  insight  into  past  character  recognition  techniques,  which  has  played 
a  major  role  in  formulating  an  approach  to  this  problem.  The  handwritten 
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address  digitized  data  base,  now  over  S00  images,  has  facilitated  the  con¬ 
struction  and  initial  testing  of  the  end-to-end  ZIP  Code  recognition  system. 
The  hardware  and  software  testbed  features  an  ERIM  Cvtocomputer  and 
a  Symbolics  LISP  Machine  running  several  layers  of  software  designed  to 
minimize  vision  algorithm  development  efforts.  The  prototype  ZIP  Code 
recognition  system  consists  of  several  phases:  binary  image  generation,  last 
line  extraction,  character  segmentation,  feature  generation,  feature  segmen¬ 
tation,  model  matching,  and  ZIP  Code  assembly. 

1.5  Future  Directions 

In  Phase  II.  the  testbed  developed  in  Phase  I  will  be  used  to  refine  the 
prototype  ZIP  Code  recognition  system.  Performance  data  from  tests  on 
the  image  data  base  will  be  analyzed  and  used  to  focus  subsequent  research 
directions  on  approaches  that  indicate  the  greatest  increase  in  system  perfor¬ 
mance.  Topics  that  will  require  further  attention  include  extracting  slanted 
address  lines,  segmenting  touching  digits,  windowing  digits  with  overlapping 
bounding  boxes,  refining  and  expanding  digit  models,  adding  last  line  data 
base  information  into  ZIP  Code  assembly,  and  adding  context  information 
into  ZIP  Code  hypotheses  formulation. 

Several  improvements  to  the  methods  developed  in  Phase  I  can  be  made. 
Algorithms  that  perform  well  with  textured  backgrounds  and  analysis  of  bro¬ 
ken  digit  characters  are  desired  for  improvements  in  binary  image  generation. 
Last  line  extraction  performance  can  be  enhanced  through  techniques  that 
handle  slanted  address  lines.  Expansion  of  the  current  feature  set,  which 
consists  solely  on  concavity  features,  will  enhance  character  models.  Finally, 
many  improvements  to  ZIP  Code  assembly  axe  planned.  These  include  look¬ 
ing  for  more  than  standard  five-digit  ZIP  Code  sequences,  analysis  of  last 
lines  for  missing  ZIP  Code  information,  and  using  additional  contextual  ad¬ 
dress  block  information. 
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Phase  I  Overview 


2.1  Literature  Review 

QPTICAL  character  recognition  is  a  relatively  old  field  about  which  much 
has  been  written.  To  help  us  achieve  an  understanding  of  some  of  the 
fundamental  issues  associated  with  this  field,  a  literature  survey  extending 
back  to  the  earliest  reported  work  in  character  recognition  was  performed 
early  in  Phase  I,  entitled  \lethodoiogies  of  Optical  Character  Recognition. 
Also,  research  at  the  USPS  library  was  conducted.  Abstracts  dealing  with 
optical  and  machine  character  recognition  were  collected  by  ERIM's  Informa¬ 
tion  Center  through  the  Dialog  information  services,  specifically  through  the 
Inspec  and  Compendex  data  bases.  Papers  included  in  the  search  were  cho¬ 
sen  based  upon  both  a  perceived  contribution  to  the  theoretical  and  practical 
aspects  of  OCR:  and  a  perceived  uniqueness.  Additional  important  litera¬ 
ture  was  gathered  through  the  references  of  papers  and  articles  that  had 
already  been  collected.  Judging  from  the  computer  data  bases  and  refer¬ 
ences  within  papers,  the  final  compilation  of  our  literature  search  represents 
a  near-complete  set  of  significant  OCR  literature. 

Upon  comp’.cUon  of  this  search  we  wrote  a  literature  review  that  describes 
and  categorizes  previous  OCR  methodologies  and  allows  us  to  compare  and 
contrast  ERIM’s  new  methods  with  past  OCR  research.  This  literature  re¬ 
view  segments  OCR  methodologies  into  three  major  steps:  preprocessing 
techniques,  feature  extraction,  and  classification  methods.  Preprocessing 
techniques  axe  distributed  into  the  following:  data  representational  conver¬ 
sion,  thresholding,  segmentation,  normalization,  skeletonization,  and  filter¬ 
ing.  Feature  extraction  methods  axe  grouped  into  template  matching  and 
transform  methods,  and  topological  and  geometrical  feature  methods.  Clas¬ 
sification  methods  axe  assembled  into  statistical  pattern  recognition,  syntac¬ 
tical  pattern  recognition,  multilevel  classification,  contextual  analysis,  and 
n-gram  error  detection  and  correction.  A  section  on  ERIM  methodologies  is 
included  that  covers  how  our  methods  relate  to  previous  work  in  light  of  the 
categories  described  in  the  review.  Finally,  an  extensive  annotated  bibliog¬ 
raphy  describing  OCR  literature  referenced  in  the  review  has  been  prepared. 

This  literature  review  is  published  separately  as  an  ERIM  document,  and 
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accompanies  this  report.  Ail  literature  collected  for  this  review  is  available 
at  ER1M  through  Information  Services. 


2.2  Address  Data  Base 

We  have  compiled  and  are  using  a  large  image  data  base  of  address  block 
information-at  present  we  have  over  S00  images.  The  Computer  Science 
Department  of  SUNY  at  Buffalo  has  been  the  major  source  of  this  data 
base.  SUNY  has  digitized  images  from  a  large  number  of  sources  including 
Electrocom.  Alcatel  CGA-HBS.  ERIM.  the  downtown  Buffalo  post  office,  the 
USPS  main  office,  and  themselves.  ERIM  supplied  SUNY  with  approximately 
700  envelopes  to  be  digitized.  These  were  collected  from  ERIM  personnel  at 
both  our  Ann  Arbor  and  Washington  offices,  but  the  great  majority  were 
obtained  from  Community  High  School  in  Ann  Arbor.  Students  were  given 
envelopes  and  random  listings  of  American  publishers'  and  post-secondary 
institutions'  addresses.  They  were  requested  to  choose  an  address  from  their 
list  and  write  it  on  an  envelope  as  if  they  were  actually  sending  the  envelope 
through  the  maii.  This  effort  was  coordinated  through  Steve  Eisenberg  of 
Community  High. 

SUNY  has  designed  a  sampling  procedure  to  obtain  2500  images  of  hand¬ 
written  addresses.  Two  thousand  of  the  samples  are  to  be  divided  into  ten 
groupo  of  200:  the  ten  groups  represent  each  of  the  ten  ZIP  Code  zones  de¬ 
noted  by  the  first  numeral  of  the  ZIP  Code.  Each  of  the  200  addresses  of 
each  group  are  to  be  distributed  evenly  over  every  state  contained  in  the 
zone.  The  remaining  500  addresses  are  to  be  divided  into  25  groups  con¬ 
taining  20  samples  apiece.  Each  group  represents  a  major  U.S.  city,  and  of 
the  20  in  each  group,  approximately  half  will  be  written  in  cursive  and  the 
remainder  will  be  handprinted.  The  images  wiii  be  chosen  to  provide  variety 
in  address  composition.  Addresses  with  both  ZIP,  ZIP+4,  no  ZIP.  explicit 
tend  abbreviated  city  and  state  names  will  all  be  represented. 

2.2.1  Digitization 

We  have  written  a  general  command  file  for  converting  SUNY’s  images  to  the 
format  that  we  require  in  the  Digital  Command  Language  (DCL)  of  DEC  VMS 
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on  the  ERIM  VAX.  At  first  we  tried  using  the  DCL  command  file  READCOMP 
given  to  us  by  Sl’NY  and  written  by  P.  G.  Mulgaonkar  at  SRI  International, 
but  image  files  produced  by  this  operation  could  not  be  read  at  ERIM's 
computer  site  since  a  majority  of  our  image  processing  software  requires  files 
with  fixed-length  records.  To  address  this  problem,  we  wrote  a  'C  program 
i  fixed,  c)  to  convert  image  files  to  fixed-length  record  format.  We  also 
wrote  our  own  command  file  that  reads  compressed  image  data  from  SUXY's 
tape,  converts  the  file  to  a  format  required  by  the  compress  .  c  program, 
decompresses  the  file  (compress .  c),  and  then  alters  it  using  our  fixed,  c 
program.  All  source  code  for  these  programs  and  instructions  for  their  use 
at  ERIM  are  in  Appendix  C.  We  have  completely  automated  the  process  of 
handling  new  images  given  to  us  in  SUXY's  image  standard-all  images  are 
decompressed  by  running  an  overnight  batch  job. 


2.2.2  Truthing 

Tools  for  truthing  the  images  have  been  developed  during  this  project  phase. 
These  tools  allow  the  program  developer  to  label  segmented  characters  with 
the  correct  information.  This  process  creates  a  truth  file  for  each  individually 
segmented  image.  Additional  tools  have  been  developed  to  log  the  results 
of  the  recognition  process.  These  tools  take  the  results  of  the  matcher  and 
write  them  out  to  a  log  file  to  record  the  matching  results.  Other  tools  have 
been  developed  to  relate  the  results  of  the  matching  process  with  the  truth 
files.  These  tools  allow  the  model  development  to  work  in  a  focused  manner. 
For  example,  using  these  tools  the  model  developer  can  build  an  image  of 
all  matching  problems  associated  with  the  digit  three  so  that  he  can  focus 
on  resolving  these  problems.  Combined,  all  these  tools  create  a  powerful 
environment  for  building,  testing,  and  debugging  the  digit  models. 


2.3  ZIP  Code  Testbed 

A  testbed  for  studying  and  evaluating  various  approaches  to  ZIP  Code  recog¬ 
nition  was  developed  under  Phase  I  of  the  Advanced  Research  in  Recognition 
of  Handwritten  Address  ZIP  Code  Project.  This  testbed  consists  of  both 
hardware  and  software  components.  These  components  represent  state-of- 
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the- art  technology  in  image  processing/understanding  and  facilitate  the  use 
of  a  rapid  prototyping  methodology  on  this  problem.  Under  this  methodol¬ 
ogy.  an  end-to-end  prototype  system  is  first  developed  and  then  incrementally 
refined  until  satisfactory  performance  criteria  are  met.  This  methodology  is 
well-known  to  work  best  on  complex,  difficult  problems  in  which  the  solution 
must  be  interactively  derived  from  knowledge  acquired  through  experimen¬ 
tation  and  analysis.  Because  the  ZIP-Code-recognition  problem  has  these 
characteristics,  this  methodology  is  exceptionally  well-suited  for  this  prob¬ 
lem.  The  testbed  implements  this  methodology  by  providing  an  environ¬ 
ment  in  which  ideas  can  be  implemented  and  tested  rapidly.  It  enables  the 
researcher  to  focus  his  attention  on  the  true  recognition  issues,  rather  than 
the  detailed  implementation  issues.  We  believe  this  is  the  only  way  that  a 
problem  this  complex  can  be  solved  in  an  efficient  manner. 

2.3.1  Testbed  Hardware 

The  primary  hardware  components  of  the  ZIP  Code  testbed  are  an  ERIM 
Cyto-HSS  image  processing  machine  and  a  Symbolics  3650  LlSP  machine. 
An  IP/TCP  Ethernet  link  provides  the  communication  between  the  two  ma¬ 
chines.  The  Cyto-HSS  provides  raw  processing  power  for  pixel-based  neigh¬ 
borhood  operations.  It  is  able  to  perform  high  resolution  S-bit-per-pixel 
morphological  vision  operations  at  roughly  10  million  neighborhood  opera¬ 
tions  per  second  [3].  The  Symbolics  3650  provides  an  environment  for  both 
numeric  and  symbolic  computation. 


2. 3. 1.1  Cyto-HSS  Cytocomputer 

The  current  fourth  generation  Cyto  High-Speed  System  (Cyto-HSS)  devel¬ 
oped  at  ERIM  incorporates  cascaded  neighborhood  processing  stages  together 
with  other  significant  processing,  control,  and  storage  units.  Each  neigh¬ 
borhood  processing  stage  performs  10  million  complete  3x3  neighborhood 
morphological  operations  per  second  in  parallel  on  8-bit  image  pixels.  By 
installing  10  stages  into  the  pipeline  of  stages.  100  million  3x3  neighbor¬ 
hood  morphological  operations  per  second  are  performed.  The  high-speed 
intelligent  image  memories  can  simultaneously  supply  and  accept  S-bit  im- 
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age  pixels  at  the  rate  of  10  million  pixels  per  second.  This  is  an  effective 
pixel  rate  of  20  million  S-bit  pixels  per  second  per  board.  The  Cyto-HSS 
processing  is  controlled  from  the  ERIM-developed  image  processing  language 
C4PL  (Cvtocomputer  Portable  Parallel  Picture  Processing  Language).  This 
software  system  is  the  result  of  several  generations  of  evolution  in  ERIM  pro¬ 
prietary  interactive  image  analysis  languages.  C4PL  is  fully  integrated  with 
ERIM's  Cyto-HSS  Image  Processing  Systems. 


2. 3. 1.2  Symbolics  3650 

The  Symbolics  3650  is  a  powerful  computing  environment.  It  features  a 
hardware  tagged  memory  architecture  for  run-time  data-type-checking  and 
generic  operations,  a  stack-oriented  architecture  with  large  stack  buffers,  a 
powerful  front-end  microprocessor,  hardware-assisted  garbage  collection  for 
memory  efficiency  with  low  software  overhead,  and  a  sophisticated  and  sup¬ 
portive  software  engineering  environment.  The  current  configuration  for  this 
machine  provides  8  MB  of  main  memory,  one  368  MB  Winchester  disk  drive, 
two  RS232C  serial  I/O  ports,  a  built  in  Ethernet  interface,  and  15  expansion 
slots  for  additional  options.  The  software  environment  features  such  exten¬ 
sions  as  Flavors  object-oriented  programming,  networking,  window  manage¬ 
ment.  graphics,  multitasking,  editors,  and  debuggers. 


2. 3. 1.3  Networking  and  Communications 

An  initial  communication  capability  between  the  Cvtocomputer  and  the 
Symbolics  was  developed  in  the  first,  phase  of  this  project.  An  overview 
of  this  capability  is  shown  in  Figure  1.  This  implementation  makes  use 
of  the  existing  C4PL  environment  and  its  ability  to  program  the  Cvtocom- 
puter.  To  do  this,  the  VAX  is  introduced  as  an  intermediate  node  between 
the  Symbolics  and  the  Cytocomputer,  with  all  three  machines  connected  via 
an  ethemet.  The  VAX  is  used  as  a  C4PL  host.  In  this  configuration,  the 
Symbolics  communicates  with  C4PL  and  as  a  consequence,  C4PL  interacts 
with  the  Cytocomputer.  For  example,  if  tin  OCR  algorithm  running  on  the 
Symbolics  determines  that  more  image  analysis  data  is  needed  from  the  Cy¬ 
tocomputer  to  help  make  a  good  decision,  it  can  send,  via  the  ethernet.  a 


9 


C4PL  request  to  a  C4PL  process  on  the  VAX.  C4PL  reads  the  request  off  of 
the  ethernet  and  issues  corresponding  requests  to  the  Cytocomputer. 


2.3.2  Testbed  Software 

A  major  component  of  the  testbed  is  software  that  was  developed  under  this 
phase  of  the  contracted  effort.  This  software  builds  upon  the  native  languages 
of  the  Cyto-HSS  and  the  Symbolics  LlSP  machine  to  provide  an  environment 
in  which  recognition  algorithms  can  be  developed  in  an  efficient  way.  The 
testbed  software  is  based  on  a  library  philosophy  that  enables  ideas  to  be 
developed  from  existing  algorithm  modules.  It  thus  minimizes  the  amount 
of  new  code  that  must  be  developed  when  new  ideas  are  implemented.  The 
testbed  software  is  also  based  on  an  open  architecture  philosophy  that  per¬ 
mits  continued  extensions  of  the  libraries  of  algorithm  modules  as  the  re¬ 
search  develops.  These  extensions  can  be  made  at  the  image  processing, 
segmentation,  feature  attribute,  feature  relation,  and  matching  levels.  Un¬ 
der  this  philosophy,  as  new  fundamental  approaches  to  ZIP  Code  recognition 
are  identified,  they  can  be  incorporated  into  the  testbed  environment.  This 
facilitates  rapid  prototyping  by  providing  a  powerful  and  flexible  develop¬ 
ment  environment  in  which  research  can  be  focused  on  recognition  issues. 

An  overview  of  the  software  system  developed  under  this  project  is  shown 
in  Figure  2.  Processing  within  this  system  consists  of  three  main  phases: 
transforming  the  raw  ZIP-Code  image  into  state-labeled  feature  maps,  com¬ 
posing  the  resulting  feature  maps  into  a  composite  symbolic  feature  map. 
and  identifying  ZIP  Codes  by  matching  feature-based  digit  models  to  the 
composite  map.  The  interaction  between  these  three  phases  is  driven  by 
a  hierarchical  matching  strategy  that  is  designed  to  minimize  the  amount 
of  unnecessary  processing  while  maintaining  algorithm  performance.  The 
matching  strategy  itself  is  directed  by  feature-based  digit  models.  These 
models  facilitate  rapid  development  of  experimental  vision  systems. 

In  the  first  processing  phase,  the  raw  ZIP-Code  image  is  transformed  into 
state-labeled  feature  maps.  These  transformations  are  performed  by  low- 
level,  image-processing  algorithms  operating  on  the  Cytocomputer.  These 
algorithms  are  directed  at  locating  ZIP-Code  features  in  the  image.  A  typical 
feature  might  be  a  digit  window,  a  concavity,  an  end  point,  or  a  junction 
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point.  The  Cytocomputer  is  ideally  suited  to  perform  such  operations. 

In  the  second  processing  phase,  the  state-labeled  feature  maps  are  trans¬ 
formed  into  a  composite  feature  map.  This  is  accomplished  by  segmenting 
the  individual  state-labeled  images  and  taking  various  measurements  on  the 
resulting  regions.  Since  the  feature  maps  are  represented  as  state-labeled 
images,  the  process  of  segmentation  is  straightforward.  Furthermore,  the 
segmentation  is  performed  only  on  an  as  needed  basis.  Thus,  if  simple  ini¬ 
tial  tests  determine  that  an  important  digit  feature  does  not  exist  in  certain 
portions  of  the  image,  then  there  is  no  need  to  segment  additional  feature 
maps  in  those  areas.  Likewise,  if  a  feature  does  not  possess  a  necessary  digit 
attribute,  e.g.  size  or  shape,  then  a  digit  match  is  not  possible  and  there  is 
no  need  to  compute  additional  feature  attributes.  Thus,  the  entire  composite 
feature  map  is  dynamically  developed  as  needed  by  the  matching  process, 
and  since  only  information  that  is  required  for  the  solution  is  calculated,  the 
overall  computational  complexity  of  the  resulting  vision  system  is  reduced. 

In  the  third  and  final  processing  phase,  digits  are  identified  by  matching 
prototypical  feature-based  digit  models  with  portions  of  the  composite  sym¬ 
bolic  feature  map.  The  models  axe  represented  as  ordered  matching  clauses 
that  describe  how  features  must  exist  in  the  image  in  order  to  be  classified. 
These  clauses  identify  the  features  that  are  required,  the  attributes  that 
these  features  must  have,  and  the  relationships  that  must  exist  between  the 
features.  These  clauses  also  dictate  how  attention  will  be  focused  in  search¬ 
ing  for  the  digits  within  the  image  and  how  those  digits  will  be  ultimately 
identified.  The  clauses  represent  an  efficient,  hierarchical  matching  strategy 
that  eliminates  nonmatches  as  quickly  as  possible  by  using  simple  tests.  For 
example,  if  a  test  determines  that  a  key  digit  feature  does  not  exist  in  the 
image,  then  there  is  no  reason  to  continue  the  matching  process.  In  keeping 
with  this  idea,  the  overall  testing  strategy  is  hierarchically  organized  from 
least  to  most  complex  with  the  most  complex  tests  being  performed  on  only 
the  most  promising  matches. 
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Figure  2.  TESTBED  OVERVIEW 


2.3. 2.1  State-Labeled  Feature  Map  Development 

The  testbed  uses  the  Cytocomputer  to  process  the  input  ZIP- Code  image  and 
produce  the  state  labeled  feature  maps.  A  state-labeled  feature  map  is  an 
image  in  which  pixels  in  a  given  state  indicate  the  presence  of  a  particular 
kind  of  feature  in  the  original  image.  The  Cytocomputer  produces  state- 
labeled  feature  maps  by  transforming  the  original  image  with  a  series  of 
operators.  The  Cytocomputer  has  operators  for  filtering,  thresholding,  and 
skeletonizing  images,  as  well  as  operators  for  finding  and  labeling  regions  in 
the  image. 

The  Cytocomputer  processing  starts  with  grayscale  filtering  operations. 
One  important  class  of  Cytocomputer  filters  are  those  based  on  the  open¬ 
ing  and  closing  operations  of  mathematical  morphology.  These  filters  are 
tuned  to  specific  spatial  scales,  and  correct  filtering  depends  on  the  spatial 
scale  of  the  ZIP  Codes  located  in  the  imagery.  Small  scale  filters  axe  used  to 
smooth  the  image,  removing  variations  which  are  too  small  to  be  of  interest. 
Large  scale  filters  are  used  to  estimate  slowly  varying  backgrounds,  which 
may  then  be  subtracted  from  the  image  to  form  a  thresholdable  image.  The 
Cytocomputer  can  also  perform  matched  filtering  which  will  tend  to  empha¬ 
size  digits  which  have  shape  characteristics  which  match  the  filter’s  shape 
characteristics. 

The  filtered  image  may  be  used  as  input  for  edge  detection  or  may  be 
thresholded  to  produce  binary  images.  In  the  case  of  edge  detection,  the 
detected  edge  segments  axe  sent  directly  to  the  composite  feature  map.  The 
connected  regions  in  binary  images  may  also  be  passed  directly  to  the  com¬ 
posite  feature  map,  or  they  may  be  subjected  to  further  Cytocomputer  pro¬ 
cessing. 

In  many  cases  skeletonization  (also  called  thinning)  may  be  used  to  reduce 
binary  images  to  lines  of  single  pixel  thickness.  The  Cytocomputer  can  also 
mark  endpoints  and  junctions  in  skeletonized  images.  A  skeletonized  and 
marked  image  provides  a  very  useful  summary  of  the  overall  structure  of  the 
shapes  in  the  binary  images. 

Another  kind  of  processing  for  binary  images  involves  the  use  of  operators 
from  mathematical  morphology.  With  these  operators  regions  passing  size 
and  shape  criteria  may  be  kept  while  other  regions  are  removed  from  the 
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image.  Region  borders  may  also  be  smoothed.  By  noting  exactly  what  kind 
of  processing  produced  a  given  region,  the  composite  feature  map  may  know 
quite  a  lot  about  the  regions  it  has  without  having  to  explicitly  measure  their 
properties. 

The  library  of  Cvtocomputer  operations  provides  the  building  blocks  to 
perform  a  large  number  of  image  processing  functions.  The  choice  of  which 
operators  use.  and  how  to  combine  them  depends  on  the  problem.  The  goal 
of  this  level  of  processing  is  to  summarize  the  important  information  in  the 
image  in  the  form  of  state-labeled  feature  maps.  The  regions  in  these  maps 
become  the  features  on  the  composite  feature  map. 


2. 3. 2. 2  Composite  Map  Development 

The  second  processing  phase  within  the  system  is  focused  on  the  development 
of  the  composite  symbolic  feature  map.  This  phase  is  critical  to  the  overall 
system,  since  it  provides  the  bridge  between  the  pixel-based  processing  of 
low-level  vision  and  the  symbolic- based  processing  of  high-level  vision.  The 
segmenter,  which  extracts  features  for  the  composite  map  from  the  state- 
labeled  feature  maps,  must  provide  a  rich  set  of  techniques  for  identifying  a 
wide  range  of  features.  The  data  structures  that  define  the  composite  map 
must  be  flexible  to  allow  long-term  system  enhancements.  They  must  also 
be  efficient  to  enable  rapid  information  retrieval  over  a  wide  range  of  queries. 

The  representation  of  the  composite  map  determines  much  of  the  overall 
system  flexibility  and  performance.  An  overview  of  the  map  representation 
is  presented  in  Figure  3.  As  illustrated  in  this  figure,  the  map  is  an  object 
with  two  primary  components:  a  feature  list  and  a  relation  hash  table.  Both 
facilitate  access  to  information  in  the  composite  map.  This  access,  however, 
is  somewhat  varied.  The  feature  list  is  an  indexed  list  of  features  identified 
within  the  image.  This  list  provides  quick  and  ready  access  to  features 
located  anywhere  in  the  image.  Typically,  this  form  of  map  access  is  used 
to  locate  features  within  some  region  of  interest  in  the  image.  The  relation 
hash  table  is  a  simple  hash  table  object.  This  table  contains  previously 
calculated  relationships  between  features  of  the  composite  map.  Typically, 
it  is  accessed  anytime  a  relationship  is  requested  to  see  if  that  relationship 
has  already  been  determined.  Together  the  feature  list  and  the  relation  hash 


15 


table  facilitate  efficient  access  to  composite  map  information. 

The  primary  component  of  the  feature  list  is  a  feature.  Within  the 
testbed,  a  feature  is  represented  as  a  simple  object  with  a  region  and  an 
attached  property  list.  The  region  locates  the  feature  boundaries  within  the 
original  image.  The  property  list  holds  the  feature  type  and  state  produced 
by  the  segmenter.  The  type  identifies  the  state-labeled  image  which  pro¬ 
duced  the  feature.  The  state  identifies  its  intensity  level  within  the  image. 
Run-length  encoding  is  used  as  the  representation  scheme  by  the  testbed 
segmenter.  A  library  of  methods  use  this  representation  to  calculate  feature 
attributes.  A  scheme  is  used  which  intercepts  the  message  defined  by  these 
methods.  If  the  message  has  not  been  received  before,  it  is  allowed  to  con¬ 
tinue,  and  the  resulting  attribute  value  is  added  to  the  feature  property  iisc. 
If  a  message  has  been  received  before,  it  is  aborted,  and  the  desired  result 
is  simply  recalled  from  the  property  list.  In  this  way.  feature  attributes  are 
computed  once  when  needed,  and  redundant  calculations  are  eliminated. 

Features  are  extracted  from  the  state-labeled  maps  and  placed  into  the 
composite  map  by  the  segmenter.  Since  this  segmenter  operates  exclusively 
on  state-labeled  images,  it  is  much  simpler  than  corresponding  gray-level 
segmenters.  The  segmenter  simply  identifies  features  as  regions  in  the  state- 
labeled  image.  For  each  identified  region,  the  routine  then  creates  a  new 
feature  object,  attaches  its  region  description,  and  adds  it  to  the  composite 
map. 

The  primary  component  of  the  relation  hash  table  is  a  description  of  a 
relationship  between  two  features  and  a  relationship  value.  The  description 
is  stored  as  a  Lisp  form,  and  used  as  an  index  into  the  hash  table.  The 
relationship  value  represents  the  result  of  the  relationship  calculation.  If  the 
relationship  has  been  calculated  once,  the  hash  table  is  used  to  retrieve  the 
resultant  value,  and  thus  avoid  unnecessary  recomputation. 
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Figure  3.  Composite  Feature  Map  Representation 


2. 3. 2. 3  Model  Development  and  Matching 

The  third  processing  phase  within  the  system  is  focused  on  matching  digit 
models  to  composite  map  facts.  The  models  describe  how  selected  features 
must  exist  in  the  image  to  be  classified  as  a  particular  digit.  The  models  are 
either  primitive  or  complex.  The  primitive  models  correspond  to  features  in 
the  state-labeled  feature  map.  The  complex  models  consist  of  models,  model 
attributes,  and  model  relations.  The  attributes  specialize  the  models  and 
determine  their  size,  shape,  orientation,  etc.  The  relations  define  the  spatial 
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qualities,  e.g.  above,  next-to,  or  between,  that  exist  between  the  models. 
This  recursive  model  definition  enables  the  development  of  complex  models 
from  simpler  models.  It  also  facilitates  the  development  of  model  packages 
that  can  be  developed  and  placed  into  libraries. 

The  models  are  developed  in  a  model-matching  language  developed  at 
ERIM.  This  language  is  comprised  of  clauses  that  can  be  organized  into  four 
different  classes:  bindings-operator,  free-set-operator,  side-effect-operator, 
and  control-operator.  The  bindings-operator  clauses  either  generate  or  prune 
the  candidate  matching  bindings.  The  free-set-operator  clauses  manipulate 
the  free-set  from  which  the  candidate  bindings  are  selected.  The  side-effect- 
operator  clauses  generate  a  desired  side-effect.  The  control-operator  clauses 
determine  the  flow  of  control  for  clause  evaluation.  Together  these  clauses 
create  a  powerful  model-matching  language  which  is  well  integrated  into  the 
Symbolics  LlSP  machine  environment. 

The  model-matching  language  can  be  defined  in  BN'F  form  as  seen  in 
Figure  4.  This  formal  definition  specifies  the  syntax  that  is  used  in  writing 
models.  The  clauses  shown  in  this  definition  are  organized  into  the  four 
classes  outlined  above  in  Figure  5.  This  organization  is  used  below  to  describe 
the  semantics  associated  with  each  clause. 


Binding-Operator  Clauses 

The  bindings-operator  clauses  define  the  way  in  which  bindings  are  handled 
by  the  model-matching  language.  These  clauses  can  be  further  broken  down 
into  two  groups:  those  that  generate  bindings  and  those  that  prune  bindings. 
Included  within  the  clauses  that  generate  bindings  are  the  require,  allow,  and 
bind  clauses.  Included  within  the  clauses  that  prune  bindings  are  the  test, 
ordered,  pairwise,  and  and-nothing-else  clauses. 


Require 

The  require  clause  is  used  to  bind  a  matched  submodel  to  a  variable.  It 
generates  all  possible  bindings  for  each  entry  on  the  current  binding  list.  For 
each  new  binding,  features  are  matched  and  removed  from  the  free-set.  If 
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(modei) 

(arg-list) 
(color) . 

(clause-list) 

(clause) 


(require-dause) 

(allow-dause) 

(eval-dause) 

(bind-dause) 

(free-set-dause) 

(ignore-dause) 

(test-clause) 

(ordered-dause) 

(pairwise-dause) 

(if-dause) 

(forbid-dause) 

(and-nothing-else) 

(var) 

(model-list) 

(model-or-list) 

(or) 


— *  (defmodel  ((arg-list))  ((clause-list))) 

— ►  .color  (color)  j  (empty) 

— ■*  :black  |  .-yellow  |  :red  |  :green  |  ••• 

— ►  primitive  |  (clause)  (dause-list)  |  (empty) 

— *  (require-dause)  (  (allow-dause)  |  (eval-dause) 

|  (bind-dause)  |  (free-set-dause)  |  (ignore-clause) 

|  (test-dause)  |  (ordered-dause)  |  (pairwise-dause) 
j  (if-dause)  j  (forbid-dause)  |  (and-nothing-else-clause) 

— ►  (require  (var)  (model-or-list)) 

— *  (allow  (var)  (model-or-list)) 

— *  (eval  (any-lisp-form)) 

— *  (bind  (var)  (any- Lisp-form)) 

— ►  (free-set  (any-Lisp-form)) 

— ►  (ignore  (Lisp-single-arg-predicate)) 

— »  (test  (any-Lisp-form)) 

— *  (ordered  (lisp-two-arg-predicate)  (model-list)) 

— •»  (pairwise  (Lisp-two-arg-predicate)  (model-list)) 

— *  (if  (any-Lisp-form)  (clause)) 

— (forbid  (any-Lisp-form)) 

— *  (and-nothing-else) 

— *  (any-Lisp-symbol) 

— *  (model)  (model-list)  |  (empty) 

— *  (model)  (or)  (model-or-list)  |  (empty) 

— ' *  or  |  (empty) 


Figure  4:  BNF  Definition  of  Matching  Language 
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no  match  is  possible  for  a  particular  binding,  that  binding  is  removed  from 
the  current  binding  list.  Thus,  this  clause  has  both  generating  and  pruning 
characteristics.  It  is  classified  as  a  generating  clause,  however,  since  it  has 
overwhelming  generating  potential.  In  fact,  it  is  recommended  that  this 
clause  be  used  only  when  the  binding  list  and  free- set  list  have  been  pruned 
as  much  as  possible  to  minimize  the  computational  complexity  associated 
with  its  usage. 


Allow 

The  allow  clause  is  identical  to  the  require  clause,  except  that  It  does  not 
prune  any  current  bindings  which  fail  to  produce  a  match.  Furthermore,  this 
clause  carries  forward  all  current  bindings  as  they  were  before  this  clause  was 
evaluated.  Because  of  these  two  characteristics,  this  clause  is  exclusively  a 
senerating  clause,  and  its  generating  potential  exceeds  that  of  the  require 
clause.  It  thus  should  be  used,  as  the  require  clause,  only  when  the  binding 
list  and  free-set  list  have  been  pruned  as  much  as  possible. 


Bind 

The  bind  clause  is  use  to  bind  the  results  of  a  LlSP  form  to  a  symbol.  In 
essence,  this  operation  defines  a  local  variable  for  use  in  the  matching  process. 
Since,  the  LlSP  form  can  refer  to  •variables  that  are  bound  to  unique  values  for 
each  binding  in  the  current  binding  list,  this  operation  adds  a  new  binding  to 
each  binding  on  the  current  binding  list.  This  operation  is  extremely  useful 
when  calculating  local  results  that  will  be  used  over  and  over  again. 


Test 

The  test  clause  is  used  to  make  sure  that  bindings  meet  certain  requirements. 
For  each  binding  in  the  current  binding  list,  this  operation  substitutes  the 
values  associated  with  the  local  variables  into  the  LlSP  form.  The  resulting 
LlSP  form  is  then  evaluated.  If  the  evaluation  is  false,  then  the  binding  is 
removed  from  the  current  binding  list.  Otherwise,  the  binding  passes  the  test. 
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and  remains  on  the  'list.  This  operation  provides  the  mechanism  for  assuring 
that  model-attribute  and  model-relationship  requirements  are  satisfied. 


Ordered 

The  ordered  clause  is  an  extended  version  of  the  test  clause,  in  that  it  per¬ 
forms  several  tests.  For  each  binding  in  the  current  binding  list,  this  oper¬ 
ation  first  creates  local  bindings  for  the  symbols  on  the  model  list.  If  any 
of  the  symbols  remain  unbound,  they  are  removed.  Then  the  values  of  the 
remaining  first  and  second  symbols  are  applied  to  the  two-argument  predi¬ 
cate.  If  the  result  is  false,  the  binding  is  removed  from  the  current  binding 
list.  Otherwise,  the  values  of  the  second  and  third  remaining  symbols  are 
applied  to  the  predicate.  Again,  if  the  result  is  false,  the  binding  is  removed. 
This  process  is  then  continued  until  it  exhausts  the  ordered  symbol  values. 
Potentially,  the  length  of  the  remaining  symbol  list  minus  one  application  of 
the  predicate  can  result  from  this  operation.  If  the  results  of  all  these  appli¬ 
cations  is  true,  then  the  binding  remains  a  match  candidate  of  the  current 
binding  list. 


Pairwise 

The  pairwise  clause  is  an  extension  of  the  ordered  clause,  in  that  it  performs 
additional  tests.  For  each  binding  in  the  current  binding  list,  this  operation 
first  creates  local  bindings  for  the  symbols  on  the  model  list.  If  any  of  the 
symbols  remain  unbound,  they  are  removed.  Then  the  values  of  the  remain¬ 
ing  first  and  second  symbols  are  applied  to  the  two- argument  predicate.  If 
the  result  is  false,  the  binding  is  removed  from  the  current  binding  list.  Oth¬ 
erwise.  the  ■values  of  the  first  and  third  remaining  symbols  are  applied  to  the 
predicate.  Again,  if  the  result  is  false,  the  binding  is  removed.  This  process 
is  then  continued  until  it  exhausts  the  pairwise  symbol  values.  The  number 
of  pairwise  combinations  that  exist  in  the  remaining  symbol  list  corresponds 
to  the  number  of  potential  applications  of  the  predicate  can  result  from  this 
operation.  If  the  results  of  all  these  applications  is  true,  then  the  binding 
remains  a  match  candidate  of  the  current  binding  list. 
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And- Nothing- Else 


The  and-nothing-eise  clause  is  used  to  test  the  free-set  to  make  sure  that 
everything  is  accounted.  For  each  binding  in  the  current  binding  list,  this 
operation  checks  to  see  if  the  free-set  list  is  empty.  If  it  is  not.  then  the 
binding  is  removed  from  the  current  binding  list.  If  it  is.  then  the  binding 
remains  a  match  candidate  on  the  current  binding  list. 


Free-Set-Operator  Clauses 

The  free-set-operator  clauses  provide  a  mechanism  for  accessing  the  free-set. 
Three  clauses  fall  into  this  class:  and-nothing-else.  free-set.  and  ignore.  The 
and-nothing-else  clause  was  described  above  as  a  binding-operator  clause.  It 
can  also  be  thought  of  as  a  free-set  operator  because  it  bases  its  action  on 
the  state  of  the  free-set.  However,  its  overwhelming  characteristic  is  to  affect 
the  current  binding  list,  and  because  of  this  it  is  best  thought  of  as  a  binding 
operator.  The  other  two  clauses  are  strictly  free-set  operators,  since  they 
have  no  effect  on  the  binding  list. 


Free-Set 

The  free-set  clause  is  used  to  update  the  free-set  associated  with  each  binding 
of  the  current  binding  list.  For  each  binding,  this  operation  substitutes  the 
values  associated  with  the  local  variables  into  the  LlSP  form.  The  free-set 
associated  with  the  binding  is  also  substituted  for  the  symbol:  free-set.  The 
resulting  LlSP  form  is  then  evaluated,  and  the  resultant  value  becomes  the 
new  free-set  associated  with  the  binding.  This  operation  can  either  add  or 
subtract  entries  in  the  free-set.  It  is  useful  when  significant  changes  in  the 
free-set  are  required  during  the  matching  operation. 


Ignore 

The  ignore  clause  is  used  to  prune  the  free-set  associated  with  each  binding 
of  the  current  binding  list.  This  operation  applies  each  element  of  the  free-set 
associated  with  a  particular  binding  to  a  specified  single- argument  predicate 
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function.  If  the  result  is  false,  the  element  is  removed  from  the  free-set.  If 
is  true,  it  remains  on  the  free-set.  This  clause  is  especially  useful  in  pruning 
down  large  free-sets  after  some  initial  evaluation.  This,  in  turn,  results  in 
significant  computational  savings. 


Side-Effect-Operator  Clauses 

The  side-effect-operator  clause  provides  a  mechanism  for  evaluating  proce¬ 
dures  that  generate  side  effects,  e.g.  outputting  graphical  displays.  Only  one 
clause  falls  into  this  class,  the  eval  clause.  This  single  clause,  however,  is  very 
powerful,  since  it  enables  the  matcher  to  communicate  with  its  environment. 


Eval 

The  eval  clause  is  used  to  generate  side  effects.  For  each  binding,  this  op¬ 
eration  substitutes  the  values  associated  with  the  local  variables  into  the 
Lisp  form.  The  resulting  LlSP  form  is  then  evaluated.  It  has  no  affect  on 
either  the  current  binding  list  or  the  corresponding  free-sets.  This  clause  is 
especially  useful  in  communicating  with  the  LlSP  environment. 


Control-Operator  Clauses 

The  control-operator  clauses  provide  a  mechanism  for  directing  the  flow  of 
control  through  the  matching  process.  Two  clauses  fall  into  this  class,  forbid 
and  if.  These  clauses  allow  the  matcher  to  recognize  when  no  match  is 
possible  and  when  certain  operations  should  be  performed. 


Forbid 

The  forbid  clause  is  used  to  abort  the  current  match  process.  This  operation 
applies  each  element  of  the  free-set  associated  with  a  particular  binding  to  a 
specified  single- argument  predicate  function.  If  the  result  of  any  application 
is  true,  the  matching  operation  is  aborted  immediately.  This  operation  is 
especially  useful  for  doing  a  scan  of  the  free-set  to  see  if  something  exists 
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that  would  make  a  match  impossible.  When  such  an  occurrence  exists,  there 
is  simply  no  reason  to  expend  any  further  effort  in  trying  to  coerce  a  match, 
it  cannot  be  done. 


If 

The  if  clause  is  used  to  determine  whether  certain  clauses  should  be  used 
in  the  matching  process.  For  each  binding,  this  operation  substitutes  the 
values  associated  with  the  local  variables  into  the  LISP  form.  The  resulting 
LISP  form  is  then  evaluated.  If  the  resulting  value  is  true,  then  the  condi¬ 
tional  clause  is  evoked  on  the  current  binding.  This  conditional  clause  then 
determines  the  final  state  of  the  binding.  If  the  resulting  value  is  false,  the 
binding  is  unaffected.  This  clause  is  especially  useful  making  sure  that  all 
the  preconditions  for  the  conditional  clause  are  satisfied. 

The  digit  models  are  developed  in  the  model-matching  language  just 
described.  The  process  begins  by  carefully  analyzing  the  feature  qualities 
associated  with  digits  or  digit  components.  This  analysis  results  in  an  un¬ 
derstanding  of  how  features  can  be  used  to  describe  a  particular  numeric 
character.  This  description  includes  existence  of  particular  features,  measur¬ 
able  attributes  for  the  resulting  features  and  spatial  relationships  between 
those  features.  This  description  builds  upon  components  of  the  state-labeled 
feature  maps  and  the  composite  symbolic  feature  map  created  from  the  first 
two  processing  phases.  The  ordering  of  the  model  clauses  also  determine  the 
efficiency  of  the  resulting  matching  process. 


2.4  Prototype  ZIP  Code  System 

A  prototype  ZIP  Code  recognition  system  was  developed  under  Phase  I  of 
the  Advanced  Research  in  Recognition  of  Handwritten  Address  ZIP  Codes. 
This  system  provides  a  first-cut  implementation  of  an  end-to-end  system  for 
recognizing  ZIP  Codes.  It  was  developed  on  the  testbed  described  above. 
This  prototype  system  serves  several  purposes.  It  provides  a  tool  for  evolv¬ 
ing  specifications  for  a  robust  ZIP  Code  recognition  system.  It  provides  a 
framework  for  exploring  the  interaction  between  each  of  the  system  phases. 
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It  also  provides  a  mechanism  for  extending  system  performance  through  in¬ 
cremental  testing,  evaluating,  and  refining.  This  system  thus  becomes  an 
integral  part  in  the  overall  rapid  prototyping  and  iterative  refinement  re¬ 
search  methodology. 

The  prototype  system  goes  through  several  processing  phases.  In  the  first 
phase,  the  initial,  grey-level  image  is  transformed  into  a  binary  image.  Next, 
the  last  address  fine  is  extracted  from  the  the  binary  image.  The  resultant 
last  line  image  is  then  segmented  into  characters  and/or  symbols.  Features 
are  then  generated  for  these  characters.  The  resultant  state-labeled  feature 
map  is  then  segmented  to  produce  a  composite  symbolic  feature  map.  Digit 
models  are  then  matched  to  portions  of  this  map.  The  resultant  identified 
digits  are  then  assembled  into  a  five-digit  ZIP  Code.  Each  of  these  processing 
steps  is  described  in  detail  below. 


2.4.1  Binary  Image  Generation 


The  first  processing  phase  in  the  prototype  system  transforms  the  initial, 
raw.  grey-level  image  into  a  two-valued,  binary  image.  A  simple  threshold¬ 
ing  approach  was  implemented  on  the  ERIM  Cytocomputer  to  accomplish 
this  task.  Under  this  approach,  limited  testing  of  sample  input  images  was 
performed  to  determine  an  appropriate  threshold  or  cutoff  value  to  use  in 
separating  the  background  of  the  address  block  from  the  stroke.  This  value 
was  then  used  to  implement  the  Cyto  thresholding  algorithm.  Within  this 
algorithm,  morphological  operations  were  used  to  sort  pixels  in  the  initial, 
input  image  into  those  belonging  to  the  stroke  and  those  belonging  to  the 
background.  Those  values  that  were  below  the  selected  threshold  value  were 
classified  els  belonging  to  the  background  by  determining  whether  the  pixel 
value  was  above  or  below  the  selected  threshold.  The  input  to  this  Cyto 
algorithm  is  an  image  from  the  SUNY  Buffalo  address  block  data  set.  The 
output  is  an  image  in  which  the  character  strokes  of  the  address  block  has 
one  value  and  the  background  has  another  value. 
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2.4.2  Last  Line  Extraction 


Next,  the  prototype  recognition  system  locates  the  last  line  in  the  binary 
image.  As  a  first  attempt  at  this  portion  of  the  overall  recognition  process, 
a  histogramming  approach  was  implemented  on  the  ERIM  Cytocomputer. 
Figure  6  shows  a  pictorial  representation  of  the  resulting  algorithm.  As  seen 
in  the  figure,  this  approach  begins  by  migrating  the  stroke  pixels  in  the  binary 
image  in  a  leftward  direction  to  create  a  histogram.  The  height  of  each  entry 
in  this  histogram  indicates  the  number  of  pixels  located  on  each  horizontal 
raster  line  of  the  binary  image.  Once  the  histogram  is  computed,  it  is  sliced 
by  removing  a  fixed  number  of  pixels  from  each  line.  Any  small  gaps  in  the 
resulting  sliced  histogram  are  then  removed  to  produce  line  locations  within 
the  image.  The  last  such  line  is  finally  windowed  and  placed  into  a  last  line 
image. 


2.4.3  Character  Segmentation 

Individual  characters  within  the  binary  last  line  image  are  then  identified. 
A  histogramming  approach  similar  to  the  one  used  in  last  line  extraction 
weis  implemented  on  the  ERIM  Cytocomputer.  Figure  7  shows  a  pictorial 
representation  of  the  resulting  algorithm.  As  seen  in  this  figure,  the  approach 
begins  by  migrating  the  stroke  pixels  in  the  last  line  image  in  a  downward 
direction  to  create  the  histogram.  In  this  histogram,  the  height  of  each  entry 
indicates  the  number  of  pixels  in  location  on  each  vertical  raster  line  of  the 
last  line  image.  As  before,  small  gaps  are  removed  from  this  histogram  to 
produce  estimated  character  locations  within  the  last  line.  These  estimates 
are  then  used  to  produce  a  last  line  image  in  which  the  individual  characters 
are  isolated  and  windowed. 


2.4.4  Feature  Generation 

The  segmented  last  line  image  is  next  processed  to  produce  a  state-labeled 
image.  A  concavity  approach  was  implemented  on  the  ERIM  Cytocomputer 
for  demonstration  in  the  prototype  system.  Figure  S  shows  a  pictorial  repre¬ 
sentation  of  the  resulting  features.  As  seen  in  this  figure,  six  unique  features 
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are  developed  in  this  process:  north-cavities,  south  cavities,  east-cavities, 
west-cavities,  center-cavities,  and  holes.  The  north-cavities  axe  defined  as 
contiguous  sets  of  pixels  in  the  image  that  would  not  hit  a  stroke  if  moved 
vertically  in  a  northward  (upward)  direction  but  would  hit  a  stroke  if  moved 
in  a  vertically  southward  (downward),  horizontally  eastward  (right),  and 
horizontally  westward  (left)  direction.  The  south-cavities,  east-cavities,  and 
west-cavities  are  similarly  defined.  The  holes  axe  defined  as  contiguous  sets 
of  pixels  in  the  image  that  are  completely  encircled  by  stroke  pixels.  The 
center-cavities  axe  defined  as  contiguous  sets  of  pixels  in  the  image  that  are 
not  holes  but  from  which  any  of  the  four  movements  would  hit  a  stroke. 
These  features  are  represented  in  the  state-labeled  image  that  results  from 
this  processing  phase  as  tagged  pixels  in  which  each  grev-level  value  corre¬ 
sponds  to  a  particular  feature.  Figure  9  shows  a  pictorial  representation  of 
the  features  that  would  result  for  various  representations  of  digit  six. 


2.4.5  Feature  Segmentation 

The  features  are  then  extracted  from  the  state-labeled  image  and  placed  into 
a  composite  symbolic  map.  A  segmenter  was  implemented  on  the  Symbolics 
LISP  machine  in  ZetaLlSP  to  perform  this  task.  An  overview  of  the  function¬ 
ality  of  the  segmenter  is  graphically  represented  in  Figure  10.  In  this  figure, 
a  state-labeled  image  for  a  typical  six  is  segmented  into  its  various  feature 
components:  strokes  and  cavities.  The  segmentation  process  is  straightfor¬ 
ward.  First,  the  state-labeled  feature  image  is  scanned  from  top  to  bottom 
and  left  to  right  to  locate  simply  connected  components  of  contiguous  pixels 
with  the  same  state  or  grey  value.  Next,  the  same  state,  simply  connected 
components  which  touch  axe  assembled  into  more  complex  components.  Fi¬ 
nally.  the  same-state,  connected  components  are  identified  and  placed  into 
the  composite  map  els  objects  with  specified  run-length  regions. 


2.4.6  Model  Matching 

The  digits  are  next  identified  in  the  composite  map  by  matching  models  to 
the  extracted  features.  Two  major  efforts  were  performed  to  implement  this 
capability  in  the  prototype  system:  a  model  matcher  was  implemented  and 
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Figure  9.  Typical  Sixes 
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initial  digit  models  were  developed. 

The  model  matcher  was  developed  on  the  Symbolics  LlSP  machine  in 
ZetaLlSP.  It  implements  the  testbed  model  matching  language  described 
above.  As  depicted  in  Figure  11  the  match  operates  like  a  large  sieve  to 
filter  out  match  candidates.  An  overview  of  the  matching  process  for  a 
typical  six  characterization  is  illustrated  in  Figure  12.  In  this  figure  matching 
is  represented  in  a  graphical  form.  Within  the  graph  the  undirected  edges 
(no  arrows)  represent  logical  components.  Thus,  a  six  is  made  up  of  a  top, 
a  bottom,  and  in  some  instances  an  extra  part.  The  solid  directed  edges 
(  arrows)  in  the  graph  represent  actual  matching  of  the  logical  component  to 
a  feature  in  the  composite  map.  Thus,  the  six  bottom  must  match  either 
a  center  cavity  or  a  hole.  Also  seen  in  the  figure  are  relations  and  tests 
that  constrain  the  match.  These  components  of  the  matching  process  test 
individual  feature  attributes  and  spatial  relationships  between  features  that 
must  exist  in  a  successful  match.  The  figure  also  illustrates  the  hierarchical 
nature  of  the  matcher  which  enables  building  complex  models  from  simpler 
models.  Thus,  in  the  case  of  the  six.  a  top  submodel  can  be  developed  and 
used  to  build  the  six  model. 

The  initial  digit  models  were  developed  in  the  model  matching  language. 
Figure  13  shows  an  example  of  a  fully  developed  six  model.  All  the  models 
developed  in  this  phase  were  based  on  the  cavity  features  described  above. 
These  features  were  used  to  build  concavity  models,  complexes  of  touch¬ 
ing  cavities.  The  concavity  and  cavity  models  were  then  collectively  used 
to  develop  digit  models  for  each  of  the  ten  digits.  Additional  digit  models 
were  developed  for  the  digits  that  had  more  that  one  characteristic  repre¬ 
sentation.  For  example,  several  different  models  were  developed  for  the  digit 
two,  including  the  loop-two  model,  the  non-loop- two  model,  and  the  lazy- 
two  model.  Each  represents  a  different  morphology  of  the  two  concept,  and 
as  such  is  explicitly  modeled.  Throughout  the  development  of  the  models, 
sample  images  were  used  to  identify  new  digit  models  and  to  test  matching 
performance. 
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(defmodel  six 

(:coior  :orange) 

(ignore  smail-ones) 

(require  window  digit-feature) 

(require  character  stroke) 

(allow  character  stroke) 

(forbid  character2) 

(require  top  se-concavity) 

(require  bottom  center-cavity  or  hole) 

(test  (any-south-of  bottom  top) 

(allow  extra  center-cavity) 

(bind  top-s-part  (sub-part  top  *>s-part)) 

(if  (and  extra  top-s-part) 

(test  (south-neighbor-of  extra  top-s-part))) 
(ordered  any-north-of  extra  bottom) 

(pairwise  horizontal-overlap  top  extra  bottom) 
(and-nothing-else)) 


(defmodel  se-concavity 

0 

(allow  s-part  east-cavity) 

(allow  c-part  center-cavity) 

(allow  e-part  south-cavity) 

(test  (or  s-part  e-part)) 

(ordered  south-neighbor-of  s-part  c-part) 
(ordered  east-neighbor-of  e-part  c-part)) 


Figure  13.  Typical  Digit  Model 
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2.4.7  ZIP  Code  Assembly 


Finally,  the  results  of  the  matching  process  is  used  to  determine  the  ZIP  Code. 
Again,  a  simple  approach  was  implemented.  This  approach  is  summarized  in 
Figure  14.  As  seen  in  this  figure,  the  digit  matcher  is  run  on  all  hypothesized 
digit  regions.  The  results  of  this  matching  are  then  merged  to  assemble  the 
ZIP  Code.  If  five-digit  regions  possess  the  spatial  qualities  of  a  legal  ZIP 
Code  and  if  each  of  the  five  regions  match  a  single  digit  model,  then  the  five 
digits,  are  assembled  into  a  ZIP  Code.  If  all  five  digits  were  not  identified 
by  the  matcher,  then  the  results  of  the  matcher  are  reported.  No  further 
attempt  is  currently  made,  however,  to  assemble  the  ZIP  Code  from  the 
partial  matching  data  plus  additional  contextual  information. 


2.5  Discussion  of  Results 

The  Phase  I  activities  in  Advanced  Research  in  Recognition  of  Handwrit¬ 
ten  Address  ZIP  Codes  have  been  very  fruitful.  As  discussed  above,  it  has 
produced  four  major  outcomes:  the  OCR  literature  review,  the  handwritten 
address  digitized  image  data  base,  the  hardware  and  software  testbed  for  in¬ 
vestigating  the  recognition  of  handwritten  addresses,  and  the  prototype  ZIP 
Code  recognition  system.  These  four  components  create  a  solid  foundation 
upon  which  our  future  research  in  this  area  can  be  conducted  in  an  efficient 
and  effective  manner. 

The  OCR  literature  review  has  produced  significant  insight  into  past  ap¬ 
proaches  to  character  recognition.  This  insight  has  already  played  a  major 
role  in  formulating  the  overall  approach  to  this  problem.  Several  different 
approaches  have  been  made  to  selected  portions  of  the  problem  in.  the  past. 
Many  of  these  approaches  were  seemingly  successful,  but  none  were  con¬ 
ducted  in  the  context  of  an  end-to-end  system.  Because  of  this.  Phase  I 
researchers  decided  to  develop  a  testbed  concept  that  would  allow  rapid  de¬ 
velopment  and  continued  refinement  of  a  prototype  ZIP  Code  recognition  sys¬ 
tem.  This  concept  thus  allows  continued  integration  of  different  approaches 
to  various  aspects  of  the  recognition  problem,  including  techniques  which 
may  be  developed  by  other  USPS  contractors. 

The  handwritten  address  digitized  image  data  base  contains  over  S00 
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diverse  address  images  and  continues  to  grow.  SUNY  at  Buffalo  has  provided 
a  great  majority  of  the  digitized  images.  Their  address  sources  include  the 
USPS  main  office,  local  post  offices,  and  and  USPS  OCR  contractors.  This 
data  base  has  facilitated  the  construction  and  initial  testing  of  the  end-to-end 
ZIP  Code  reading  system.  Numeral  models  have  been  generated  and  refined 
using  the  ZIP  Codes  contained  in  this  data  base.  Using  the  methodology 
developed  for  this  project,  the  models  can  be  quickly  and  easily  evolved  as 
the  data  base  becomes  larger.  This  supports  a  form  of  digit  learning  where 
the  acquired  knowledge  is  complete  and  precisely  describable  and  rigorous. 

A  hardware  and  software  testbed  was  designed  and  implemented.  It  con¬ 
sists  of  both  hardware  and  software  components.  Featured  within  the  testbed 
are  an  ERIM  Cvtocomputer  and  a  Symbolics  LlSP  machine.  These  special¬ 
ized  computers  provide  the  computational  pixel  and  symbolic  throughput 
required  to  effectively  develop  and  test  recognition  algorithms.  Residing  in 
these  computers  are  several  layers  of  software  designed  to  minimize  the  effort 
required  in  developing  vision  algorithms.  Together  the  hardware  and  soft¬ 
ware  components  of  the  testbed  create  a  powerful  development  environment 
for  exploring  solutions  to  ZIP  Code  recognition. 

An  end-to-end  prototype  ZIP  Code  recognition  system  was  developed. 
This  prototype  consists  of  several  processing  phases:  binary  image  genera¬ 
tion.  last  line  extraction,  character  segmentation,  feature  generation,  feature 
segmentation,  model  matching,  and  ZIP  Code  assembly.  An  initial  solution 
to  each  processing  phase  was  developed  within  this  prototype  system.  Some 
of  these  solutions  are  recognized  as  simplistic.  However,  their  implemen¬ 
tation  enables  focused  refinement  of  a  complete  recognition  system.  The 
software  as  of  Phase  I  completion  is  included  in  Appendices  A  and  B. 

These  four  Phase  I  outcomes  establish  a  solid  foundation  of  Phase  II  re¬ 
search.  The  OCR  literature  review  provides  significant  insight  into  steering 
the  research  effort  in  directions  that  promise  the  most  success.  The  image 
data  base  provides  the  mechanism  for  testing  and  evaluating  potential  prob¬ 
lem  solutions.  The  address  understanding  testbed  enables  future  work  to  be 
focused  on  problem  solutions  with  minimal  effort  expended  on  implementa¬ 
tion  issues.  The  end-to-end  prototype  system  establishes  a  baseline  system 
in  which  each  phase  of  the  overall  ZIP  Code  recognition  process  can  be  evalu¬ 
ated  and  further  developed.  Together  these  Phase  I  outcomes  form  the  basis 
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for  accelerated  Phase  II  research. 


3  Phase  II  Overview 

JN  Phase  II  of  the  Advanced  Research  in  Recognition  of  Handwritten  Ad¬ 
dress  ZIP  Codes  the  performance  capabilities  of  the  prototype  Phase  I 
system  will  be  extended.  An  overview  of  the  methodology  that  will  be  used 
during  this  phase  is  seen  in  Figure  15.  As  seen  in  the  figure,  this  methodology 
is  iterative  in  nature.  The  OCR  literature  review,  the  handwritten  address 
image  data  base,  and  the  testbed  will  all  play  an  integral  part  in  this  pro¬ 
cess.  The  literature  review  will  serve  to  generate  and  evaluate  new  research 
ideas.  Selected  ideas  will  then  be  rapidly  developed  and  integrated  into  the 
prototype  system  using  the  powerful  tools  within  testbed  environment.  The 
performance  of  the  extended  system  will  then  be  evaluated  on  images  from 
the  handwritten  image  data  base.  The  results  of  this  evaluation  will  then  be 
used  to  focus  the  next  iteration  of  refinement. 


3.1  Technical  Approach 

The  testbed  described  above  is  used  in  Phase  II  to  refine  the  prototype  ZIP 
Code  recognition  system.  In  Phase  I  the  hardware  and  software  address 
recognition  testbed  was  constructed.  In  Phase  II  this  testbed  will  be  used  to 
explore  intricacies  and  develop  solutions.  An  iterative  refinement  method¬ 
ology  will  be  used  in  this  process.  Under  this  methodology,  a  set  of  test 
images  will  be  processed  through  the  current  version  of  the  prototype  ZIP 
Code  recognition  system.  This  processing  will  produce  performance  data 
which  can  then  be  analyzed  and  used  to  focus  subsequent  research  direc¬ 
tions.  By  using  this  methodology,  the  Phase  II  research  can  be  focused  on 
areas  that  show  most  promise  in  upgrading  the  overall  system  performance. 
Thus,  the  research  can  be  directed  in  such  a  way  as  to  produce  optimal 
results. 

Several  research  directions  are  already  known  to  be  prime  candidates 
for  exploration  in  this  phase  of  the  research.  Included  in  the  list  of  topics 
that  are  known  to  require  further  attention  are  generating  binary  images, 
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Phase  I 


Phase  II 


Phase  III 


Figure  15.  R  &  0  Methodology 
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extracting  slanted  address  lines,  segmenting  touching  digits,  windowing  dig¬ 
its  with  overlapping  bounding  boxes,  refining  and  expanding  digit  models, 
adding  last  line  data  base  information  into  ZIP  Code  assembly,  and  adding 
context  information  into  ZIP  Code  hypotheses  formulation.  Approaches  axe 
currently  being  developed  to  address  these  problems.  Additional  issues  are 
also  expected  to  arise  throughout  this  research  phase.  Once  identified,  these 
issues  will  be  analyzed  and  prioritized  based  the  expected  return  on  expended 
effort. 

It  is  anticipated  that  each  processing  phase  of  the  prototype  system  will 
be  significantly  enhanced  during  Phase  II.  Much  of  the  anticipated  work  is 
outlined  in  the  short  subsections  that  follow.  Although  this  work  is  described 
in  some  detail,  it  does  not  include  all  of  the  anticipated  Phase  II  activities, 
since  new  activities  are  expected  to  arise  as  continued  software  development 
provides  insight  into  advanced  recognition  system  requirements.  This  is  the 
nature  of  research  and  development  of  this  kind. 

3.1.1  Binary  Image  Generation 

During  Phase  II  this  processing  phase  will  be  enhanced  by  developing  new 
techniques  to  handle  images  with  textured  envelopes  and  broken  characters. 

The  current  prototype  solution  performs  adequately  on  most  images.  It 
does  not.  however,  perform  well  on  images  with  textured  backgrounds.  Sev¬ 
eral  techniques  exist  in  the  computer  vision  literature  for  identifying  textured 
surfaces.  These  techniques  will  be  explored  and  evaluated  for  performance 
on  this  problem.  Several  experiments  will  be  conducted,  and  the  best  re¬ 
sulting  technique  will  be  integrated  into  the  final  system.  One  candidate 
technique  that  appears  very  promising  is  to  identify  and  remove  very  short 
disconnected  line  segments  from  the  image.  An  initial  view  of  the  test  images 
suggests  that  this  simple  technique  may  eliminate  much  of  this  problem. 

The  current  prototype  solution  also  creates  a  significant  number  of  broken 
characters.  Some  of  these  characters  are  broken  because  they  axe  written 
with  two  disconnected  strokes.  The  five  with  a  flying  top  in  a  prime  example. 
These  problems  must  be  modeled  and  identified  by  the  matcher.  Other 
characters,  however,  axe  broken  because  of  the  difference  in  intensity  within 
the  stroke.  Initial  review  of  these  test  data  shows  that  looped  two's  are 
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prime  examples  of  this  phenomenon.  This  is  apparently  caused  by  the  fact 
that  one  must  decelerate  the  writing  instrument  to  go  around  the  loop,  thus 
creating  a  different  intensity.  This  change  in  intensity  is  magnified  when  part 
of  the  stroke  is  eliminated  during  thresholding.  This  will  be  a  problem  for 
any  simpleminded  approach  to  creating  the  binary  image.  Further  analysis 
of  this  problem  is  required  during  Phase  II. 


3.1.2  Last  Line  Extraction 

During  Phase  II  this  processing  phase  will  be  enhanced  by  developing  new 
techniques  to  handle  slanted  address  lines.  The  current  prototype  histogram- 
ming  technique  fails  on  address  blocks  with  slanted  lines.  Several  possible 
solutions  to  this  problem  are  technically  feasible.  One  technique  that  offers 
much  promise  is  model  matching.  Under  this  approach  a  ZIP  block  model 
will  be  developed  to  locate  regions  on  the  address  block  that  seem  to  possess 
ZIP  Code  spatial  qualities.  This  approach  will  be  implemented  and  tested 
early  on  in  Phase  II. 


3.1.3  Character  Segmentation 

During  Phase  II  this  processing  phase  will  be  enhanced  by  developing  new 
techniques  for  isolating  touching  and  intersecting  ZIP  Code  characters.  The 
current  prototype  solution  performs  well  on  segmenting  characters  that  have 
no  bounding  box  overlap.  It  also  works  well  on  segmenting  touching  char¬ 
acters  that  are  simply  connected  by  a  single  stroke.  Unfortunately,  there 
are  numerous  other  cases  in  which  this  approach  does  not  work  well.  Ap¬ 
proaches  to  segmenting  touching  characters  have  been  presented  in  the  liter¬ 
ature.  These  approaches  appear  to  have  some  promise,  although  they  are  far 
from  offering  a  comprehensive  solution.  Experiments  on  these  approaches 
will  be  conducted,  and  application  specific  enhancements  will  be  explored. 
From  this  work,  one  technique  will  be  selected  for  integration  into  the  final 
system. 
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3.1.4  Feature  Generation 


During  Phase  II  feature  generation  will  be  enhanced.  The  current  prototype 
solution  incorporates  only  cavity  features  even  though  skeleton  and  endpoint 
features  are  computed  by  the  Cytocomputer.  Although  concavity  features 
appear  to  be  very  powerful,  they  are  limited  in  what  they  can  represent.  At 
present,  it  appears  that  character  recognition  rates  between  70  and  90  per¬ 
cent  are  possible  with  these  features.  Higher  recognition  rates,  however,  will 
require  incorporation  of  the  additional  features  which  provide  specific  infor¬ 
mation  about  the  character  stroke.  Because  of  the  flexibility  built  into  the 
testbed,  this  implementation  will  be  fairly  straightforward,  thus  facilitating 
online  development  through  experimentation. 


3.1.5  Feature  Segmentation 

There  are  no  current  plans  to  modify  the  prototype  feature  segmentation 
algorithm.  However,  it  is  possible  that  modifications  may  be  necessary  if 
data  structures  other  than  currently  supported  run-length  regions  become 
computationally  desirable. 


3.1.6  Model  Matching 

There  will  be  significant  modifications  to  the  digit  models  during  this  project 
phase.  The  current  prototype  models  perform  as  expected,  and  produce 
character  recognition  rates  between  70  and  90  percent.  They  are  however, 
extremely  limited  in  some  areas.  Additional  features  will  be  developed  to 
resolve  these  limitations.  These  features  will  then  be  incorporated  into  the 
digit  models.  Throughout  Phase  II,  this  aspect  of  the  the  overall  system 
is  expected  to  focus  the  overall  research  directions.  During  this  process, 
the  test  images  will  play  an  important  role  in  surfacing  research  issues  and 
measuring  performance. 
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3.1.7  ZIF  Code  Assembly 


There  will  be  several  enhancements  to  ZIP  Code  assembly  during  Phase  II  of 
this  project.  The  currem  prototype  solution  is  simplistic  in  nature.  It  only 
looks  for  five-digit  character  sequences.  It  does  not  check  the  last  line  data 
base  to  see  if  a  proposed  ZIP  Code  exists.  It  does  not  use  the  last  line  data 
base  to  resolve  missing  information.  It  does  not  use  additional  contextual 
address  block  information  to  augment  the  digit  matching  process.  These 
limitations  will  be  addressed  during  this  phase  of  the  project.  Solutions  to 
the  first  two  areas  are  fairly  straightforward,  and  will  be  implemented  in  the 
obvious  manner.  The  last  area,  however,  will  require  significant  thought  and 
effort  to  identify  techniques  for  reliably  locating  and*  identifying  contextual 
sources  of  information.  The  methodology  that  will  be  employed  here  will 
be  to  study  the  content  of  the  address  block  to  identify  areas  of  possible 
exploitation.  Promising  areas  will  then  be  analyzed  from  an  image  processing 
perspective  to  determine  the  reliability  of  required  feature  extraction.  Those 
areas  that  look  most  promising  will  then  be  implemented  and  evaluated  for 
performance  on  actual  data. 


3.2  Management  Plan 

It  is  expected  that  Phase  II  will  be  a  continue  the  work  of  Phase  I.  The 
current  research  team  will  remain  intact  and  will  continue  to  work  full  time 
on  this  project. 
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4  Conclu&ions 


,”j~'WO  major  topics  in  Advanced  Research  in  Recognition  of  Handwritten 
Address  ZIP  Codes  have  been  discussed  in  this  report.  Tne  first  dis¬ 
cusses  the  work  performed  under  Phase  I  of  this  effort  in  which  four  subjects 
were  explored.  These  subjects  are  a  review  of  the  optical  character  recogni¬ 
tion  literature,  the  development  of  the  address  block  image  data  base,  the 
development  of  a  testbed  for  developing  computer  vision  solutions  to  address 
block  understanding,  and  the  development  of  a  prototype  ZIP  Code  recogni¬ 
tion  system.  The  second  major  topic  describes  the  planned  Phase  II  activities 
under  this  project.  Under  this  topic  a  technical  approach  and  management 
plan  was  outlined  that  proposed  continued  iterative  refinement  of  the  pro¬ 
totype  system  as  the  major  Phase  II  activity.  The  research  team  feels  that 
there  is  still  much  work  that  needs  to  be  done  on  this  problem.  They  also 
feel  that  the  approach  that  has  been  developed  will  prove  to  be  significant 
and  will  result  in  a  positive  research  outcome. 


48 


5 


References 


1  USPS  Request  for  Proposal.  RFI-BL-004  BOA  Task  8. 

2  Smith.  S.T..  Schrader.  M.E.,  Mitchell.  B.T..  Gillies.  A.M..  and  Jacobus. 
C.J..  "Methodologies  of  Optical  Character  Recognition."  ERIM  Technical 
Report  ARYD  S7-0S5. 

3  Wallich.  P.,  "Minis  and  Mainframes."  IEEE  Spectrum.  January.  1985. 


49 


Appendix  A 

Digit  Models 


51 


ltSCOOO$(JUAl  :  IHITCIIIIL.  Z£NO}I)Ml  l^»H ,  1  21  HAY  19B7  1VS2  Pd<J«  )  |  |  IISTOO(J$MJAl  ;  (Hl'ltlltlL  ZfcM> | |*ll  1.SP ;  1  21  MAY  I9d7  15:52  Pdy« 


c  s '  si  /06i  avm  ic  i Ms-i  imlrwai  raiiM.iHl  iviinSooo:js»i  |  I  s  3f,p>»  rs  si  <bm  avm  ic  ras-i  iMifoN3z  imuiwl  ivnatoooosii 


-21 — rr 

A  U  V  C  4  » 

A  M  M  —  &  M 

-  9  a  9  9  ■  m 
3  Op-  i  a  .a  a 

2.  I  9  A 

r. 

3  a  —  A  A 
->-fl  ]|  -  -  A 

>*  >•  9  M  A  9  9  W 
AAU9Q90.a,9 
-  —  -»  3  -»  ■  a  a 

>  >  »  a  9  5  —  —  a. 

t  3  3  A  a  3 

>  v  y  **  S  w  i  a  a  — 

<  ;C939AUma 

i-3 3i=i? lli 

71  a  a  m.  a  i 

:s*#-*3vH~ 

)AAA99«'s,3*e 

*  g  u  u  u  * 

ISllS-^a  3 

jf-ti  X  ~0  10- 

i  J  s  ®  ®  -  v»v'  ^ 

lllliiiiiif; 

I  M.  M 


??  r 

11  -7 

-*■ 

A  A  A 
MU  I  Q 

11  8-r 

ja-a  -  2 

AM  A  t  « 

JO  0 


>0  0  3  « 

?■* 

j.  2  w  «  P  — * 
5  >.  >  Q  JS  mi 
3  —  —  i:  -  3 

•  33  Q.  0  mi  < 
U  U  Q  —  I 

C285^o.l 

1&&&.S3! 

Jills: i 

•  0  0*0  —  - 


21=  - 

m  w  a  .2 


J?2j2: 

;=8.i» 

:  S1*1‘ 

'!*  *S 

9  «  u  a 

llll 


3-7  =S 

A  M  M.  AM  ' 

A  3  0  3  A  U  3 
AW  .  V  c  3  V 

u  Ml  3  3  3  ■  3 

3  3  U  i  *  -Q  a 
2.  0  9  A  ^ 
(AIM*-  -  a 
AM.  ^  9  A 

A  A  a  Ml  Ml 

a  9  5  u  u  mi 

>»  ®  3  5  \  3  9  (ft  < 

a  Ah  «  di\&iX« 

A  >.  3  u  a  a  ,  I  A  « 
>  A  A  3  A  A  A- 

9  a  i  v*in 

a  >  A  Ml  -S  •  A  Ml  A 

C  9  9  J2  a©  u  U  a 

rFylJtiiW- 
*  *5 1  —  **  *9  *9 — 

a4  O  9  (  «  3)  c 

U  9  A  9  9  W  mm 

9  a.  a  -  a  a  a  9  . 

?-8sr — 

—  ?*a  t:  t:  t:  i  i — 

99W9999r  '  *g  . 
u  u  a  &  a  auu  a  g 

3  3  03  1*  ii 

rrsS—ii*: 

U  U9^AAA^,aA 


a.A^  u  w 

-!r!l 

A  A 

A  U  A  M*  M* 

>•9  Am*  0  0 

271 ?kk 

>  A  I  AAA 

9  u  <n  5  3 

y  a  u  g  o  a 

i?"3?  **2. 

a  «C  i  >  »•  >•  Q 
5  A  A  A  A  A  ^ 


IJ  If 
» 4? 


SaAaaa^  Sad  3 

5  M  3  A  A  A  A  5  3  9 

•  Q9999ai«A5A  \  C 

fi9UUUQU93’A  aa 

a  •  ■  0  «  9  0  v>  •  ^  * 


u  >*>.c  c  S  a  &  9t>*a,  a,^  2  •» 

issm.k.p:!!~‘8  ;: 

1 1 1 1 1  «  j  1  a.  a. «  »  ii 

■!«««! Mz:~  5 

*  0  0  0  0  0  «  -  4  0  "  s 


■n  a  »  - 

«  a  ~  ^ 


?«  -  ? 
2>  s  "i 

i'5  ”7 

Mi 


55 


CS'SI  /B6I  AVH  It  I  'dST  » Wl(<rm  TllrxI.IHl  MVl»(liOOO:)SII  1*^4  M  SI  *BM  A¥W  If  l  <  din  |M||(fJ32  laiOLIHl  '  IVfKItOOODSH 


9 

—  O' 

—  c 

-  ?  - 

so  -3d 

■a  — 

«  -  l*- 
—  X  **  <£ 
—  —  -  T 
4  «  — 

S'  w  ~2 

4  o  -o  •• 

M  2a 

23*J 

w  J3  C 

an 

:  3  —  ?  tJ 


01  C  —  9  — 

S?«! 


;  3  =  S  8J 

■  —  *5  ^  *5  -  o 

—  5.  c  44 

'  3  *3  —  *3  —  w 

•  q  =  J  5  *J  4  - 

-33  y  -s 

9  n  9  73 

«  C  -  —  m  ' 
*a  ha  —  -  )  • 

3  0  3—35- 

s* J3  Jr 

—  —  -  9  -  «  >. 


L  1  31: 

»  -  -  a. 

—  0  *  4 

L» 1S| 3  S 
1  3  A  “3>«-  ? 

3  31—0  — 

•  w  o  ^  y  4 
)  9  —  g.2  .c  -* 

:  s-i^:  3- 

I  V  M  «J  3  M  3 

*  3  1 2  3  *  - 

J  m  .5  X  *3 

»•  a  x  —  c  — 

to—  '9  4 

J’ll'sil 

,  ?  ic  3  2  ; 

1  =  1-  a,! 

.  ■«  j  —  o  i- 


fa  4  —  -  *3 

fa  «a  —  -  4  M 

:  3  SI3-  3 

L  ?■  »i-  . 

w  —  O.  -z 

t  —  0  4  4  7> 

>»«*  —  I  ha  —  — 

;slf|  1 1  I 

l  <o  1  a.2  o  — 

'j  ja  <  3"—  ^ 

•  •  3  a  —  0  — 

w  oi  y  4 

<  V  -  &  •>  —  3 

>  -*  0  <  —  C  9  5 

c  **  **  ^  k«  3  —  w  3 

I  9  u  —  o  e*  «  —  Z 

iu123=-=3"l 

.r « 3-  =  f  |i| 

i  5  3**9  a  *3  S  —  3s 

-  -*2®9—  90  — 

.8-3^5  io  s- 

l=|- 

<  4  J2  —  0  — —  *  9 


9  -  3  r*":  .2 


r  —  3  '?  — 

1  .1  «  -  3  -  N 


S-= 


tf2  ' 

11  2 

8  8  : 


>  >3 

®  cn 

9  — 

9  — 


2=  * 

9.2  - 

0  4  — ** 

-a?S 

faj  fa*  —  & 

i  «*  jj  .a  ^ 

-  3» ;  —  — 


*<o  **  ?  3 

M  ■  W  A  I 

l'nl*> 
**  — i  5  ^ 

o  ^  «  -j  « 
ha  — •  *•  9 
—  —  ha  3 

II  H= 

*  M  farf  ha  0  * 


—  **  <•  5  — 

r^J  s 

H—  —  W  ^ 

>®  — 

4  —  ’T  ^ 

y -  p  o  ^ 

X  A  A  JZ  «  —  ? 

l-slsal 

w  —  e  s  s  2  -g 

w  ®  4  ■  —  2 

x  «  3^5 

5  ^  ^  ®n  — 

J  X  u  h.  h.  fi 

!i«h? 


4  >  >  0 

"SSL 

w  —  9  —  2 

L  ■.  c  o  — 
3  9  2  9 
5  w  o  c 

i  —  C  u  w 
C  O  —  4  a> 

l|a-s 

«  2  v 


>•—  —  — 

2  Ptv. 

>  U  4  V 

4  g  a  — 

a*  c  — 

«i  — 

4  0  **  *d 
9  C  U  0} 

i  4  9 

25  ^2 

1  g  2  ^ 
2-j  o  1 

5  3  *.  « 
Z2  3- 

4  0**  — 


3‘J  J 

fi  —  r  «. 
5  a  **  c 

w  9  0 
i  —  C  & 


—  -  3»  O' 

—  0.41  "3 

2  „5£ 


»•  O  —  O  **  **  ** 

*  >•  U1  40  -3  •* 

•  —  **  3*  O'  - 

►  0  —  **»  —  —  ■» 

‘  l  ;  ?  j  2  ’ 

.|?s —  ^ 

!  3»5  2  —  — 


1  —  e  m 

u  g  —  4 
3  S  4  a 


i  unis!  1 

—  4  O  4  O  w  ■—  «. 


3  ••  a  ?■ 

-••S' 

—  u 

-S  "3  8 

3  ?- 

•—  *  4 

5  — 


u  4  u» 
2 


^  3  ^  s.  3. 

2  |  *  0  u 

<S  —  "  0 

O  4  O  —  — 


®  a?  c  —  9  — 

-  9  —  L3  9 
5  >  3«  0  -x 
9  3  •  —  k-  0 
**  —  —  JZ  **  ha 
—  4  —  ■•«)** 

^  9  O*  01  — 
>  —  —  J  Wa  r% 
••  «  9  0  ®r»  w 

u  e  3  2  o  «  ** 

o  —  5  —  4  —  o 

•  o  2.1*1:8  2 

i ?1 a15  SI 
«  8  *5  u 

—  V  ha  fca  ^ 
•S  0  3-  3  83 

8  &ff5  7=;s 

—  —  ha  9  ha  <|  — 

4 - 


—  w  i 
>•  «a 

S-.1" 

>  a. .  0 

4  o  »-  — 

O  fa*  •* 


-i 5 ? :  -  " 

PI  1  -*JS  4  J1  —  i 

W)**+*«  9®5 

9  ha  ha  —  5  31  *9  i 

tifafa  3  4  »  5  -  C  0 

y  ha  ha  c  fa-  W  I  9—  w 

w  11  iar»  S'—  g*  *  9 

£-MOi  j*  c  - 
y  £  ^  :  — 

•  ®  ^  a*  J1 

^hakafi  —  990—  — 

—  —  —  fa.  U  fa  c  9 

£  3  3  {  *<  O  3  —  *7 

ssreals/ss  i 

—  ha  U  o  fa*  —  —  49  — 


>,^a  0—03" 
-  >  >  - 
-  *  w  a  -  y 

%  i  It 

y  •  —  -  -o 

9  T  9  ?v 

*:s:s  w 

4  9  3  9  3  -* 

9  y  y  .1 

—  —  ^ 


^21515- 

9  •*  *7  —  7  5 

i  *  J  ?s  r 

a*  4  0  4  0  — 


56 


Appendix  B 

Feature  Extraction 


59 


_HSC000$DUA1 : [ OCR . AMG . SYS ] XCAVZLINE - DEF ; 5  6-APR-1987  10:05  Page  1 

;**  -  xcaveline  in  -  process  from  raw  image  to  caves  of  last  line 
;  in  -  input  filename  or  number 
procedure ( in) 

;  global  variable  cave  receives  final  result 
gdeclare  cave 

;  get  the  image  and  process  to  binary  image 
xgetbin  in 

;  get  the  last  line  in  a  fixed  image  size 
xgetfixedline 

;  seperate  the  characters  on  the  last  line 
xseparate 

;  detect  features  in  the  separated  characters 
xprocseps 

;  get  the  result  from  global  variable  cave 

copy  cave 

endprocedure 
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_HSCOO  OSDUAl : [ OCR . AMG . SYS ] XGETB IN . DEF ; 6 


7-APR-1987  11:02 


Page  1 


;**  xgetbin  im  slice  -  get  and  slice  an  image  ->  bin  (global) 

;  im  -  input  filename  or  number 
slice  -  threshold 
procedure  ( im , slice ) 
gray 

setdef  15->slice  ;  default  threshold  -  15 

gdeclare  bin  ;  global  variable  to  get  binary  image  result 

declare  code  ;  variable  to  hold  stage  programs 

declare  temp  ;  temporary  image  storage 

;  get  the  raw  image  from  disk 

unsave  im 

;  save  it  in  temp 

copy  ->temp 

gdeclare  raw 

copy  ->raw 

;  estimate  background  level  by  closing  raw  by  a  cylendar  of  radius  20 
loadcode  ' closecyl20 . noc '  ->  code 
apply  code 
empty  ->  code 

;  subtract  raw  image  from  the  background  image 
diff images  active  temp  ->  active 

;  threshold  giving  fg  (strokes)  state  5,  and  backgroung  state  0 

gdeclare  diffimg 

copy  ->  diffimg 

slice  0  slice  0  5 

color 

!  get  rid  of  long  horizontal  things 

spanv  052  110  40  ->,code 

spanv  525  110  40  ->,code 

cover  5  0  -> , code 

cover  2  5  ->,code 

apply  code 

t  save  binary  image  in  global  variable  bin 

copy  ->bin 

endprocedure 
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_HSC000$DUA1: [ OCR . AMG . SYS ] XGETFIXEDLINE . DEF ; 1  l-APR-1987  13:29  Page  1 

,-**  xgetfixedline  -  get  last  line  of  address  block  in  fixed  image  size 
assumes  binary  image  with  fg-5  bg-0 
procedure (wheight,  width) 

setdef  50->wheight  ,•  default  image  height  -  50 

setdef  400->wwidth  ;  default  image  width  -  400 

declare  ltop , lbot , lheight , extra , temp , left 
gdeclare  line , elw , eheight , ewidth  ,  top , bot 
;  save  image  in  temp 
copy  ->temp 

/  migrate  pixels  left  to  form  accumulated  width  histogram 

migleft 

gdeclare  lhist 

copy  ->  lhist 

call  lastline  to  find  top  and  bottom  of  last  line 
xlastline  30-> ltop, lbot 

,•  compute  fixed  size  window  placement  sa  as  not  to  go  off  image 

lheight : -lbot- ltop 

extra : - (wheight- lheight ) /2 

if  (extra>0) 

ltop : -ltop-extra 
endif 

if  (ltop>imglen-wheight) 
ltop : -imglen-wheight 
endif 

eheight : -wheight  ;  global  variable  eheight  -  window  height  (will  be  used 

ewidth : -wwidth 

top: -ltop 

bot : -ltop+eheight 

left: -imgwid-wwidth 

;  define  the  extended  line  window 

window  wheight  wwidth  top  left  ->  elw 

;  get  the  binary  image  back 

copy  temp 

,•  copy  the  last  line  window  to  global  variable  line 
copy  elw- > line 

,•  make  the  line  image  currently  active 

copy  line 

endprocedure 
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_HSC000SDUA1: [OCR. AMG. SYS] XLASTLINE.DEF; 1  l-APR-1987  13:34  Page  1 

xlastline  slop  -  find  last  line  given  "histogram"  image 
;  slop  -  the  amount  to  go  from  the  left  edge  before  cutting  lines  agar 
procedure ( slop )->  top, hot 

,  span  in  from  left  edge  of  "histogram"  turning  to  state  2 
spanv  0  5  2  10  slop 
gdeclare  lcut 
copy  ->lcut 

,•  remove  the  leftmost  part 
cover  2  0 

;  span  back  toward  right  making  fixed  height  boxes  in  state  2 
spanv  502  100  slop-t-1 
gdeclare  Ibox 
co*py  ->lbox 

;  remove  the  rightmost  parts  of  the  histogram 
cover  5  0 

;  merge  lines  which  are  very  close  together 

spanv  202  2002  3 

spanv  020  2002  3 

gdeclare  lmerge 

copy  -> lmerge 

,•  eliminate  leftover  short  lines 

spanv  020  2002  3 

spanv  202  2002  3 

gdeclare  lprune 

copy->lprune 

i  make  a  very  narrow  window  to  speed  up  pixel  scanning 
declare  lw 

window  imglen  1  1  1  ->  lw 
activate  lw 

i  intro  to  loop  to  find  top  and  bottom  of  each  line  in  image 
declare  oldtop  pix 

iscan  2  2  1  1  ->  top  pix  ;  this  finds  the  top  of  the  first  line 

;  loop  finding  top  and  bottom  of  aech  successive  line 

repeat 

oldtop : -top 

iscan  0  0  top  pix  ->  bot  pix 
iscan  2  2  bot  pix  ->  top  pix 
until  ( top-0 ) 

,•  now  we  have  top  and  bot  values 
top : -oldtop 

;  reactivate  the  binary  image 
activate  scratch 
endprocedure 
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_HSC000$DUA1: [ OCR . AMG . SYS ] XSEPERATE . DEF ; 1  27-MAY-1987  10:53  Page  1 

-  xseperate  slop  -  seperate  a  line  of  text  into  single  chars 
;  slop  -  height  at  which  to  cut  histogram  to  seperate  characters 
procedure ( slop ) 

setdef  4->slop  ,■  default  slop  value  -  4 
gdeclare  eheight,seps 
declare  temp, code, trim, wind 
trim: -10 


save  the  current  image 
copy  ->  temp 

;  migrate  pixels  down  to  from  accumulated  height  "histogram" 
migdown 

gdeclare  dhist 
copy  -> dhist 

;  span  up  from  bottom  amount  slop  turning  to  atate  2 
spanv  052  2000  slop 
gdeclare  dcut 
copy  ->dcut 

remove  bottom  part  of  histogram 
cover  2  0 

;  extend  the  remaining  histogram  to  be  full  window  in  height 
spanv  505  2002  eheight 
gdeclare  dbox 
copy  ->dbox 

i  mark  windows  narrow  enough  to  merge  in  state  4 

spanv  054  110  4 

spanv  545  110  4 

gdeclare  dn arrow 

copy  ->dn arrow 

,•  find  narrow  gaps  with  narrow  windows  on  either  side  -  state  2 

spanv  4  0  3  10  5 

spanv  432  100  5 

;  return  wider  gaps  to  state  0 

cover  3  0 

,•  turn  narrow  windows  back  to  state  5 
cover  4  5 

now  turn  all  windows  to  state  2  (narrow  gaps  are  already  in  state  2) 
cover  52 
gdeclare  dmerge 
copy  -> dmerge 

;  make  a  non-zero  background  state  to  do  skeletonizing 
cover  0  1 

;  skeletonize  background  thus  extending  windows  without  merging  them 
skelrec8  1  2  3  off  off 

,•  retrun  weindcws  to  full  height  (lost  in  skeletonization) 

spanv  121  2002  5 

i  return  background  to  state  0 

cover  1  0 

gdeclare  dthick 

copy  ->dthick 

i  add  the  original  binary  last  line  to  windows  image 
addimages  temp 
gdeclare  dchars 
copy  -> dchars 

;  now  we  have  characters  in  windows  but  they  extend  beyond  windows  somet 
;  remove  strokes  which  fall  outside  windows 
cover  5  0 
gdeclare  wspill 
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HSC000SDUA1: [OCR. AMG . SYS] XSEPERATE. DEF; 1  27-MAY-1987  10:53  Page  2 


copy  ->wspill 

;  remove  small  strokes  which  touch  borders  of  windows 

spanr  071  trim 

gdeclare  wpoke 

copy  -> wpoke 

spanr  7  I  7  (2* trim) 

cover  1  2 

gdeclare  wtrim 

copy  ->wtrim 

trim  all  strokes  to  be  al  least  1  pixel  away  from  the  window '  s  edge 
spanr  0721 
cover  2  8 

,•  remove  windows  which  have  no  strokes  in  them 
spanr  781  100 
cover  8  0 
cover  1  8 

;  copy  the  result  to  global  variable  seps 

copy  - >  seps 

endprocedure 
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_HSC000$DUA1 : [ OCR . AMG . SYS ] XPROCSEPS . DEF ; 1  l-APR-1987  14:00 

,-**  -  xprocseps  -  process  seps  image  for  ocr 
;  assume  stroke- 7  window- 8  border-0 
procedure ( tl , np ) 

setdef  2  ->  tl  ;  default  trim- length  2 
setdef  2  ->  np  ;  default  number  of  passes  2 
gdeclare  skel  cave 

eliminate  very  small  (single  2112)  fg  things 
spanv  875  2112  1 
spanr  7  5  7  30 
cover  5  8 
gdeclare  wclean 
copy  ->wclean 

;  eliminate  small  bumps  on  fg  objects 

tranbx  7787 

,•  skeletonize  the  fg 

skelrec4  731 

skelrec8  7  3  10 

cover  3  8 

gdeclare  wskel 

copy  -> wskel 

;  trim  skeletons  using  the  simplify  function  (assumes  fg-2) 

cover  7  2 

markboth  234 

gdeclare  wmark 

copy  -> wmark 

cover  3  2 

cover  4  2 

simplify  tl  np 

;  mark  endpoints- 3  and  junctions-4 

markboth  234 

copy  ->skel 

;  return  fg  to  7 

cover  2  7 

cover  3  7 

cover  4  7 

thicken  and  4-way  skeletonize  (this 
tranbx  787  2100  1  off 
tranbx  787  2010  1  off 
skelrec4  782 
gdeclare  thickskel 
copy  -> thickskel 

;  detect  concavities  using  the  caves  function 
xcaves 

leave  result  in  global  variable  cave 
copy  ->cave 
endprocedure 


_HSC000$DUA1: [ OCR . AMG . WORK ] MARKBOTH . DEF ; 5  20-JAN-1987  14:54  Page  1 

;**-  marJcboth  fg  ends  juncs  -  mark  endpoints  and  junctions  in  different 
procedure ( fg,ends, juncs) 
declare  cache 

tranbx  fg  fg  ends  6  1  on  If  or 

tranbx  fg  fg  ends  3  1  on  it  or 

tranbx  fg  fg  ends  H  on  If  or 
tranbx  fg  fg  ends  1  1  on  if  or 

tranbx  fg  fg  ends  2  1  on  If 


copy  -> cache 
cover  ends  fg 
markpoints  fg  juncs 

plxelselect  whereever  (cache-ends)-  ends 
endprocedure 
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_HSC000$DUA1: [OCR. AMG. WORK] SIMPLIFY. DEF; 7  l-APR-1987  14:03 

-  simplify  -  simplify  8-way  connected  skeletons 
t  assumes  fg-2  bg-8 
procedure ( trim_length, npasses ) 
setdef  3  ->  trim_length 
setdef  4  ->  npasses 

declare  fg  bg  endpoint  j point  temp  i  scode 

fgs-2 

bg :  -8 

endpoint: -3 
j point : -4 
temp: -5 

spanr  endpoint  fg  temp  trim_length  ->, scode 
spanr  fg  temp  fg  ( trim_lengtht2)  ->, scode 
spanr  temp  endpoint  temp  2  - > , scode 
cover  temp  bg  - > , scode 
spanr  j point  endpoint  bg  2  ->, scode 
cover  endpoint  fg  ->, scode 
cover  jpoint  fg  ->, scode 
skelrecS  fg  bg  3  -> , scode 
for  i  1  npasses 

markboth  fg  endpoint  jpoint 
apply  scode 
endfor 

endprocedure 


Page  1 
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_HSCOOO$DUA1 : [ OCR . AMG . SYS ] XCAVES . DEF ; 1  l-APR-1987  14:06 

;  _  xcaves  -  set  up  colors  and  call  cavities  to  mark  n  s  e  w. 

i  assume  stroke-7  window-8  border-0 

;  return  stroke-240  window-0  border-10 

;  n— 1  e— 2  s— 3  v— 4  c— 7  hole— 6 ( orange) 

procedure 

declare  code 

cover  0  64  -> , code 

cover  8  0  -> , code 

cover  7  128  ->,code 

apply  code 

cavities 

empty  ->code 

cover  128  240 

cover  5  7 

cover  64  10 

gdeclare  weave 

copy  -> weave 

spanv  0  7  orange  2112  30  ->,code 

spanv  1  7  orange  2112  30  ->,code 

spanv  2  7  orange  2112  30  ->,code 

spanv  3  7  orange  2112  30  ->,code 

spanv  4  7  orange  2112  30  ->,code 

exch  7  orange  ->/code 
apply  code 
endprocedure 


Page  1 
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_HSCOOO$DUA1: [OCR. AMG. SYS] CAVITIES. DEF; 5  ll-MAY-1987  17:58  Page  I 

,-**  -  cavities  -  mark  concavities  in  numerals 
;  assume  stroke-128  window-0  border-64 

;  n-1  e-2  s-3  w-4  c-5  (to  view  cover  5  130,  cover  128  200) 
procedure 

gdeclare  cavecode,cavecodeswitch 
declare  size 
size: -100 

setdef  FALSE  ->  cavecodeswitch 

if  (  cavecodeswitch  <>  TRUE)  ;  if  cavecode  does  not  already  exist,  make 
cavecodeswitch  : -  TRUE 

spanv  128  0  1  2000  size  ->,  cavecode  ,■  span  up  in  lowest  bit  plane 

bitdisab  0 

spanv  128  0  2  10  size  ->,  cavecode  ;  span  right  in  next  plane 

bitdisab  0,1 

spanv  128  0  4  2  size  ->, cavecode  ,-  span  down  in  next  plane 
bitdisab  0,1,2 

spanv  128  0  8  100  size  ->, cavecode  ,•  span  left  in  final  bitplane 

bitmask  255 

,-  using  covers  reduce  the  16  state  cavities  to  just  1,2, 3, 4, and  7 

,-  for  n,e,s,w  and  center  cavities 

cover  1  0  ->, cavecode 

cover  2  0  ->, cavecode 

cover  3  0  ->, cavecode 

cover  4  0  ->, cavecode 

cover  5  0  ->, cavecode 

cover  6  0  ->, cavecode 

cover  8  0  ->, cavecode 

cover  9  0  ->, cavecode 

cover  10  0  ->, cavecode 

cover  12  0  ->, cavecode 

cover  11  1  ->, cavecode 

cover  7  2  ->, cavecode 

cover  14  3  ->, cavecode 

cover  13  4  ->, cavecode 

cover  15  5  ->, cavecode 

endif 

apply  cavecode  1  ;  apply  cavecode  to  mark  cavities 
endprocedure 


71 


Page 


_HSC000$DCJA1 :  [  OCR .  AMG .  TOOLS  ]  HMIGRATE .  DEF ;  3  27-MAY-1987  11:09 

hmi grate  fore, passes, in  -  horizontal  migration 
;  note:  fore  cannot  be  0  (zero) 
procedure  ( fore, passes , inimage  )  ->  outimage 

setdef  ACTIVE  ->  inimage 

setret  ACTIVE  ->  outimage 

declare  nlines , npixels 

show_size  inimage  ->  nlines , npixels 

setdef  5  ->  fore 

setdef  npixels  ->  passes 

cover  (fore)  0  inimage  outimage 
cover  fore  2  inimage  ->  outimage 

declare  code 
cover  0  1  ->,code 
spanv  0  1  2  10  1  ->,code 
cover  1  0  ->,code 
bitor  1  0  ->,code 
bitmask  1 

max3d  000  200  000  1  ->,code 
bitmask  255 
bitor  1  2  -> , code 

bitmask  4 

max3d  000  002  000  1  ->,code 

bitmask  255 

cover  0  0  ->,code 

cover  1  0  ->,code 

cover  2  0  ~> , code 

cover  3  8  ->,code 

cover  4  8  ->,code 

cover  5  8  ->,eode 

cover  6  0  ->,code 

cover  7  8  -> , code 

cover  8  2  ->,code 

apply  code,  passes.#  inxmage  —  >  outimage 
cover  2  fore  inimage  ->  outimage 
endprocedure 
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_HSCO  0  0  $DUA1 : [ OCR . AMG . TOOLS  3  VMIGRATE . DEF ; 4  27-MAY-1987  11:09 

;**  vmigrate  fore, passes, in  -  vertical  migration 

;  note:  fore  cannot  be  0  (zero) 

procedure  ( fore, passes , inimage  )  ->  outimage 

setdef  ACTIVE  ->  inixnage 

setret  ACTIVE  ->  outimage 

declare  nlines , npixels 

show_size  inimage  ->  nlines, npixels 

setdef  5  ->  fore 

setdef  nlines  ->  passes 

cover  (fore)  0  inimage  ->  outimage 
cover  fore  2  inimage  ->  outimage 

declare  code 

cover  0  1  ->,code 

spanv  012  2000  1  ->,code 

cover  1  0  -> , code 

bitor  1  0  ->,c ode 

bitmask  1 

max3d  000  000  020  1  ->,code 
bitmask  255 
bitor  1  2  ->,code 
bitmask  4 

max3d  020  000  000  1  ->,code 

bitmask  255 

cover  0  0  -> , code 

cover  1  0  ->,code 

cover  2  0  -> , code 

cover  3  8  ->,code 

cover  4  8  ->,code 

cover  5  8  -> , code 

cover  6  0  ->,code 

cover  7  8  ->,code 

cover  8  2  -> , code 

apply  code,  passes,  inimage  ->  outimage 
cover  2  fore  inimage  ->  outimage 
endprocedure 


Page 
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_HSC000$DUA1: [OCR. AMG. TOOLS ]MIGDOWN.DEF;l  13-MAR- 19 87  13:30  Page 

;**  migdown  fore, passes , in  -  vertical  migration  using  vcode 

;  note:  fore  cannot  be  0  (zero) 

procedure  ( fore, passes , inimage  )  ->  outimage 

setdef  ACTIVE  ->  inimage 

setret  ACTIVE  ->  outimage 

declare  nlines , npixels 

show_size  inimage  ->  nlines , npixels 

setdef  nlines  ->  passes 

setdef  5- > fore 

cover  '(fore)  0  inimage  ->  outimage 
cover  fore  2  inimage  ->  outimage 

gdeclare  vcode 

apply  vcode,  passes,  inimage  ->  outimage 
cover  2  fore  inimage  ->  outimag'' 
endprocedure 
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_HSC000$DUA1: [OCR. AMG. TOOLS ] MIGLEFT . DEF ; 1  18-MAR-1987  13:30 

,-**  migleft  fore, passes, in  -  horizontal  migration  using  hcode 

;  note:  fore  cannot  be  0  (zero) 

procedure  ( fore, passes , inimage  )  ->  outimage 

setdef  ACTIVE  ->  inimage 

setret  ACTIVE  ->  outimage 

declare  nlines , npixels 

show_size  inimage  ->  nlines , npixels 

setdef  npixels  ->  passes 

setdef  5- > fore 

cover  "(fore)  0  inimage  ->  outimage 
cover  fore  2  inimage  ->  outimage 

gdeclare  hcode 

apply  hcode,  passes,  inimage  ->  outimage 
cover  2  fore  inimage  ->  outimage 
endprocedure 
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OCR  SUNY  Address  Tape  Processing  Report 
F.  Quek 
April  7,  1987 


This  report  describes  the  set  of  programmes  and  DCL  command  files  which 
have  been  implemented  to  extract  the  OCR  address  label  images  from  SUNY.  Also 
included  is  information  on  how  to  make  use  of  these  programmes  and  command 
files. 

The  Data 

The  data  we  receive  from  SUNY  is  in  a  compressed  (run-length  encoded)  for¬ 
mat.  Each  image  is  preceded  in  the  tape  by  a  header  file  which  describes  the  image. 
This  header  file  is  in  ASCII  and  is  formatted  as  follows  : 

image  name 

Humber  of  Rows:  #### 

Humber  of  Columns:  #### 

where 

image  name  is  a  valid  VMS  file  name 

# # ##  are  integers  specifying  the  number  of  rows  and  columns. 

Data  Extraction  Procedure 

The  input  tape  has  to  be  MOUNTed  using  the  foreign  option  and  specifying 

l 

the  blocksize  to  be  16384.  Data  on  the  tape  can  then  be  read  using  the  standard 
VMS  COPY  command.  The  mounting  command  is  as  follows: 

$  MOUm/foreign/bl*  16384  MTAx 

After  reading  the  two  files  (header  and  image),  the  image  must  be  first  put  into 
a  fixed  record  format.  This  can  be  done  using  the  VMS  CONVERT  facility.  The 
necessary  incantation  is: 

$  CGSVERT/pad/f il— FIXED .FDL  Sourcefile  Targetfile 

FIXED. FDL  is  a  VMS  EDL  declaration  file  the  listing  of  which  can  be  found 
in  the  appendix  of  this  report. 
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The  COMPRESS.EXE  programme  supplied  by  SUNY  can  now  be  run.  This 
programme  expects  an  input  file  name  with  a  ’.Z’  extension.  The  output  of  the 
programme  is  a  hie  of  the  same  name  with  no  extension.  The  programme  can  be 
run  as  follows: 

$  COMPRESS  -d  filename 

This  will  decompress  the  file  filename.  Z  yielding  a  decompressed  image  Hie 
named  filename.. 

The  resulting  image  must  be  put  in  a  fixed  record  format  to  be  accessible 
from  C4PL.  The  programme  to  do  this  is  FIXQ.EXE.  Besides  putting  the  image 
into  fixed  record  format,  FIXQ  also  downsamples  the  image  by  a  factor  of  4  if 
the  original  image  is  larger  than  512KByte  in  size  (the  limit  of  C4PL)  and  trims 
the  image  to  make  the  dimensions  even  (the  current  implementation  of  C4PL  has 
a  bug  which  precludes  operation  with  odd  column  size  images).  To  execute  this 
programme,  type 

$  r  IXQ  Sourcefilc  Targetfile  rows  columns 

where  rows  and  columns  are  integers  specifying  the  number  of  rows  and  columns 
in  the  original  image.  This  information  is  available  in  the  header  file  described 
earlier. 

Another  version  of  FIXQ  exists  to  facilitate  the  extraction  of  the  images  in 
unattended  mode.  FIXQQ.EXE  takes  as  input  the  source  image  file  and  the 
header  file.  It  reads  the  name  of  the  target  file,  the  number  of  rows  and  the  number 
of  columns  form  the  header  file.  FIXQQ  can  be  activated  as  follows: 

$  FIXQQ  i magefile  kcaderfilc 

RDCOMP.COM  is  a  VMS  DCL  file  which  permits  the  processing  of  the 
images  in  batch  mode.  It  will  read  a  tape  image  by  image  and  process  them, 
leaving  the  final  image  in  the  default  directory.  Several  lines  in  RDCOMP.COM 
has  to  be  altered  for  each  run  (to  specify  the  physical  tape  drive  on  which  the  tape 
is  mounted,  the  target  directory  of  the  images  etc.).  Instructions  on  the  necessary 
changes  are  contained  in  the  in  code  documentation  of  RDCOMP.COM 

Required  Files  and  Programmes 

The  required  files  and  programmes  are  listed  below  for  ease  of  reference. 
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RDCOMP.COM  -  The  VMS  DCL  file  which  permits  the  processing  of  an 
entire  tape  in  batch  mode. 

COMPRESS  -  The  image  compression/decompression  programme  provided 
by  SUNY. 

FIXQ  -  The  programme  which  puts  the  image  in  a  format  readable  by  C4PL. 
downsampling  and  trimming  the  image  as  necessary. 

FIXQQ  -  Similar  to  FIXQ  except  that  it  permits  operation  in  unattended 
mode. 

FIXED. FDL  -  The  VMS  File  Description  Language  file  to  be  used  with  'he 
VMS  CONVERT  command. 

The  source  code  of  the  above  are  appended  to  this  report. 
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Appendix  D 

Processing  Stages 
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Processing  Description 


The  first  image  shows  the  originai  grey-level  image  displayed  in  grey  tones. 
To  threshold  this  image,  a  threshold  level  is  determined  by  Sitering  out  the 
high-frequency  address  information  with  a  morphological  operation  to  give 
an  approximation  to  the  background,  and  subtracting  this  background  from 
the  originai  image.  The  resultant  image  possesses  only  foreground  f  address  i 
information  (the  second  image).  The  third  image  shows  this  threshoiaed 
image.  To  determine  address  lines,  a  line  density  image  is  created — all  pixels 
in  the  threshoiaed  image  are  shifted  to  the  right,  and  the  resultant  peaks 
are  separated  to  give  line  locations,  as  shown  in  the  fourth  image.  The 
red  bar  at  the  right  is  used  to  separate  peaks  from  eachother.  Five  lines 
are  detected  in  this  image:  the  name,  title,  and  business  lines  (not  shown), 
the  1300  Boeing  Drive  W  line,  and  the  Itasca,  Illinois  60143  line.  The 
fifth  image  shows  the  last  address  line.  As  a  simplistic  first  guess,  the  ZIP 
Code  is  assumed  to  be  on  the  last  line.  To  quickly  and  easily  separate 
characters,  a  the  character  density  image  is  created — all  pixels  in  the  last 
line  are  shifted  down,  and  the  resultant  peaks  are  separated  to  give  character 
locations  using  a  method  similar  to  the  line  separating  algorithm.  The  sixth 
and  seventh  images,  continuations  of  the  fifth,  show  further  steps  in  the 
character  segmentation  algorithm.  The  result  of  the  separation  algorithm, 
along  with  concavity  features  of  the  lastline,  are  shown  on  the  very  bottom  of 
the  seventh  image.  The  eighth  image  shows  an  enlargement  of  the  separated 
ZIP  Code  block  with  colored  concavity  features. 
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